Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A prompt framework for enhancing LLM-based explainability of medical machine learning models: an intensive care unit application
0
Zitationen
8
Autoren
2025
Jahr
Abstract
Explainable AI (XAI) techniques like SHAP provide valuable insights into machine learning model predictions by quantifying feature contributions. However, interpreting these quantitative outputs remains unintuitive for many clinicians, hindering their practical adoption in clinical decision-making. This exploratory feasibility study aims to propose and evaluate a prompting framework designed to guide large language models (LLMs) in generating consistent, clinically relevant explanations from SHAP values. We developed a structured zero-shot prompting framework incorporating variable definitions, safety principles, and a three-step reasoning process; (1) key risk factors, (2) prediction-outcome reconciliation, (3) clinical recommendations. This framework guided GPT-4 to explain SHAP-based predictions from an ICU extubation failure model (trained on MIMIC-III data). The framework’s performance was assessed using quantitative consistency metrics (entropy, accuracy) and a qualitative clinician survey (n = 7). The LLM achieved a fidelity of 0.783 and entropy of 0.226, indicating consistent structured reasoning. On a 5-point Likert scale, clinicians rated the LLM-generated explanations as helpful (mean: 3.94 ± 0.48) and safe (4.21 ± 0.68). However, critical care specialists assigned lower scores than non-critical care physicians (helpfulness: 3.48 vs. 4.29; safety: 3.45 vs. 4.79), suggesting domain-specific caution in perceived utility. A structured prompting framework shows feasibility in leveraging LLMs to enhance the clinical interpretability of SHAP explanations. This methodological approach warrants further investigation and refinement for broader clinical application.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.463 Zit.
Generative Adversarial Nets
2023 · 19.843 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.259 Zit.
"Why Should I Trust You?"
2016 · 14.314 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.138 Zit.