Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of multimodal large language models for pneumothorax assessment in real-world clinical scenarios
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Pneumothorax is a potentially life-threatening clinical condition requiring rapid diagnosis and intervention. It may present with symptoms such as dyspnoea, cough, chest pain, tachycardia, and tachypnoea. Heavy workload and technical limitations of chest radiographs may necessitate support for clinicians during the evaluation process. Recently, large language models (LLMs) have increased interest in using their ability to process both text and medical images as supportive tools in diagnostic processes. This study analyzed the performance of two widely used multimodal LLMs in diagnosing pneumothorax in a real clinical setting. Clinical information and anonymized chest X-rays were presented to both models using an eight-question evaluation file. The responses obtained were compared with the consensus assessment of two thoracic surgeons, considered the gold standard. Pneumothorax was detected in 30 of the total 240 patients (12.5%). ChatGPT demonstrated 100% sensitivity, 91.4% specificity, 62.5% positive predictive value (PPV), and 100% negative predictive value (NPV) in detecting pneumothorax. Gemini had a sensitivity of 70.0%, specificity of 92.9%, PPV of 58.3%, and NPV of 95.6%. The performance of both models in determining the side of pneumothorax was markedly low: ChatGPT had a sensitivity of 14.3%, and Gemini had a sensitivity of 10.0%. When it involved determining the type of pneumothorax, ChatGPT demonstrated a sensitivity of 36.7% (95% CI: 21.9–54.5), while Gemini showed a sensitivity of 60.0% (95% CI: 42.3–75.4). Both models exhibited high specificity (> 91%). When assessing the need for tube thoracostomy, ChatGPT showed 57.7% sensitivity and 94.4% specificity, whereas Gemini showed 46.2% sensitivity and 95.8% specificity. NPV was high and PPV was low in both models. Multimodal LLMs demonstrated high sensitivity in ruling out pneumothorax; however, limitations in agreement, localisation, and management decisions restrict their role to cautious, supportive use under specialist supervision.
Ähnliche Arbeiten
Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study
2020 · 22.618 Zit.
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.271 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.265 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.571 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.187 Zit.