OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 31.03.2026, 01:53

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of multimodal large language models for pneumothorax assessment in real-world clinical scenarios

2026·0 Zitationen·BMC Pulmonary MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2026

Jahr

Abstract

Pneumothorax is a potentially life-threatening clinical condition requiring rapid diagnosis and intervention. It may present with symptoms such as dyspnoea, cough, chest pain, tachycardia, and tachypnoea. Heavy workload and technical limitations of chest radiographs may necessitate support for clinicians during the evaluation process. Recently, large language models (LLMs) have increased interest in using their ability to process both text and medical images as supportive tools in diagnostic processes. This study analyzed the performance of two widely used multimodal LLMs in diagnosing pneumothorax in a real clinical setting. Clinical information and anonymized chest X-rays were presented to both models using an eight-question evaluation file. The responses obtained were compared with the consensus assessment of two thoracic surgeons, considered the gold standard. Pneumothorax was detected in 30 of the total 240 patients (12.5%). ChatGPT demonstrated 100% sensitivity, 91.4% specificity, 62.5% positive predictive value (PPV), and 100% negative predictive value (NPV) in detecting pneumothorax. Gemini had a sensitivity of 70.0%, specificity of 92.9%, PPV of 58.3%, and NPV of 95.6%. The performance of both models in determining the side of pneumothorax was markedly low: ChatGPT had a sensitivity of 14.3%, and Gemini had a sensitivity of 10.0%. When it involved determining the type of pneumothorax, ChatGPT demonstrated a sensitivity of 36.7% (95% CI: 21.9–54.5), while Gemini showed a sensitivity of 60.0% (95% CI: 42.3–75.4). Both models exhibited high specificity (> 91%). When assessing the need for tube thoracostomy, ChatGPT showed 57.7% sensitivity and 94.4% specificity, whereas Gemini showed 46.2% sensitivity and 95.8% specificity. NPV was high and PPV was low in both models. Multimodal LLMs demonstrated high sensitivity in ruling out pneumothorax; however, limitations in agreement, localisation, and management decisions restrict their role to cautious, supportive use under specialist supervision.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIArtificial Intelligence in Healthcare and EducationPhonocardiography and Auscultation Techniques
Volltext beim Verlag öffnen