Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
1045: USING LARGE LANGUAGE MODELS TO EXPLAIN ALERTS FROM A PEDIATRIC RISK STRATIFICATION MODEL
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Introduction: Hospitalized children who experience critical events, such as intubation or administration of vasoactive drugs, are at increased risk for mortality and morbidity. We recently developed pCREST, a machine learning model that continuously predicts the risk of pediatric critical events across ED, ward, and ICU settings (Strutz et al., JAMA Network Open, 2025). Here, we develop new Large Language Model (LLM)-based algorithms that use a patient’s EHR data to generate text-based explanations for pCREST alerts. Methods: We conducted a retrospective analysis of pediatric admissions to the University of Wisconsin-Madison (2009-2020). The pCREST score cutoff of 90th percentile was used to identify children at risk for critical events within the next 12 hours. We developed two Mixtral 8x7B-Instruct LLMs to generate explanations for at-risk alerts. The first LLM (LLM-1 + Transformer) summarized unstructured notes documented in the 6 hours preceding a pCREST score, and a transformer model with label-aware attention was trained to identify key phrases that were important to pCREST alert generation. The second LLM (LLM-2) was prompt-engineered to generate text summaries from vitals and labs within the prior 6 hours of a pCREST score, with explicit instructions to identify signs of clinical deterioration. An o3-Mini was used as a judge to evaluate LLM-1 summaries by comparing against notes using the validated PDSQI-9 tool on a 5-point Likert scale. Results: Among 40,498 admissions, 7,436 (18.36%) had at least one at-risk pCREST alert during their stay. LLM-1 outputs scored high on the PDSQI-9, with median (IQR) scores ranging from 4 (4, 5) to 5 (4, 5) across attributes of accuracy, thoroughness, usefulness, organization, and comprehensibility. The transformer model identified clinically relevant terms, such as “aortic,” “lethargy,” and “tachypnea” as being important for pCREST alerts. For a 2-year-old ICU sample patient, LLM-2 indicated a “concerning spike in respiratory rate (37.0 to 42.0)”, which matched with Shapley-based analysis identifying the respiratory rate of 42 as important for the pCREST alert. Conclusions: Our study lays the foundation for LLM-based pipelines to generate explainable summaries for hospitalized children identified as high-risk, potentially improving clinical decision-making.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.391 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.257 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.685 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.501 Zit.