Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Utility of Multimodal Large Language Models in Analyzing Chest X-Rays with Incomplete Contextual Information

2025·0 Zitationen·Healthcare Informatics ResearchOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVES: Large language models (LLMs) are increasingly used in clinical practice, but their performance can deteriorate when radiology reports are incomplete. We evaluated whether multimodal LLMs (integrating text and images) could enhance accuracy and interpretability in chest radiography reports, thereby improving their utility for clinical decision support. Specifically, we aimed to assess the robustness of LLMs in generating accurate impressions from chest radiography reports when provided with incomplete data, and whether multimodal input could mitigate performance loss. METHODS: We analyzed 300 radiology image-report pairs from the MIMIC-CXR database. Three LLMs-OpenFlamingo, MedFlamingo, IDEFICS-were tested in text-only and multimodal formats. Chest X-ray impressions were generated from complete text reports and then regenerated after systematically removing 20%, 50%, and 80% of the text. The effect of adding images was evaluated using chest X-rays, and model performance was compared using three statistical methods. Hallucination rates were quantified. RESULTS: In the text-only setting, OpenFlamingo, MedFlamingo, and IDEFICS demonstrated comparable performance (ROUGE-L: 0.23 vs. 0.21 vs. 0.21; F1RadGraph: 0.20 vs. 0.16 vs. 0.16; F1CheXbert: 0.49 vs. 0.41 vs. 0.41), with OpenFlamingo performing best on complete text (p < 0.001). All models exhibited performance decline with incomplete data. However, multimodal input significantly improved the performance of MedFlamingo and IDEFICS (p < 0.001), equaling or surpassing OpenFlamingo even under incomplete text conditions. Regarding hallucination, MedFlamingo showed a lower false-negative rate in multimodal compared with unimodal use, while false-positive rates were similar. CONCLUSIONS: LLMs may produce suboptimal outputs when radiology data are incomplete, but multimodal LLMs enhance reliability and may strengthen clinical decision-making support.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AIMachine Learning in Healthcare

Volltext beim Verlag öffnen

Utility of Multimodal Large Language Models in Analyzing Chest X-Rays with Incomplete Contextual Information

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen