Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
TransXplainRRG: A Clinically-Validated LVLM for Chest Radiograph Report Generation
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Recent advancements in Large Vision-Language Models (LVLM) enhance X-ray report generation, but often neglect clinical context, overlook workflow integration, and lack explainability, which hinders clinical adoption due to interpretability and reliability concerns. This paper addresses the dual challenges of performance and expert explainability in automated X-ray report generation for clinical AI applications. Our proposed approach, TransXplainRRG, utilizes an off-the-shelf Swin Transformer model along with a transformer-based text encoder that incorporates patient medical history to generate a radiology report. Further, we explore an expert-guided 'inside-out' approach and extract only abnormal findings for radiology report refinement. Thus, this study bridges the gap between high-performance automation and interpretability critical for clinical practice by combining state-of-the-art transformer-based vision encoders, text encoders, and LLMs. TransXplainRRG is trained and tested on the large-scale MIMIC-CXR dataset and further evaluated on the unseen IU X-Ray dataset to demonstrate its generalizability and robustness. We employed the GREEN metric to assess clinical accuracy, achieving scores of 0.327 on MIMIC-CXR and 0.605 on IU X-Ray. While GREEN is not yet widely adopted, we apply it across several baseline models using publicly available weights for fair comparison, where our model shows superior performance. It also performed competitively across standard evaluation metrics, frequently matching or surpassing state-of-the-art models. Additionally, we introduce qualitative evaluation metrics developed from radiologists' viewpoints to evaluate the clinical relevance of the generated reports in practical settings. Extensive quantitative and qualitative analyses further underscore the effectiveness and innovation of our TransXplainRRG framework in advancing reliable and explainable radiology report generation.
Ähnliche Arbeiten
MizAR 60 for Mizar 50
2023 · 75.218 Zit.
ImageNet: A large-scale hierarchical image database
2009 · 61.020 Zit.
Microsoft COCO: Common Objects in Context
2014 · 41.560 Zit.
Fully convolutional networks for semantic segmentation
2015 · 36.578 Zit.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.796 Zit.