Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures
0
Zitationen
8
Autoren
2025
Jahr
Abstract
INTRODUCTION: Large language models (LLMs) are promising tools for clinical decision support but require thorough validation to ensure safety and reliability. This study assessed a knowledge and intelligence messaging interface (KIMI; RevelAi Health), an LLM enhanced with retrieval-augmented generation configured with American Academy of Orthopaedic Surgeons guidelines for distal radius fracture management and a persistent system-prompt layer. The goal was to evaluate KIMI's efficacy in acuity triaging and generating appropriate patient-facing responses for distal radius fracture management. METHODS: We analyzed KIMI-generated responses to 100 simulated patient queries. Four clinical experts independently assessed responses for guideline concordance, safety, clarity, and acuity. Probabilities for adequate scoring in all domains were modeled. Bayesian mixed-effects logistic regression and ordered logistic regression models were used for binary and ordinal scoring outcomes, respectively, to account for repeated measures and within-reviewer correlations. RESULTS: Reviewer evaluations of KIMI responses demonstrated high performance across safety and quality domains. Posterior average probability of responses being rated as safe was 94.2% (95% credible interval [CI]: 91.2 to 96.9), as concordant was 88.7% (95% CI: 85.0 to 92.0), and as clear was 93.7% (95% CI: 90.5 to 96.5). Posterior average probability of exact agreement between reviewer-assigned and LLM-assigned acuity levels was 62.9% (95% CI: 58.0 to 67.7). Surgical queries were associated with slightly higher safety ratings (95.4% versus 91.3%) and acuity agreement (63.9% versus 60.6%) than nonsurgical queries. Query category markedly influenced acuity agreement. LLM-assigned acuity was markedly associated with reviewer-assigned acuity across all models even when adjusting for both query type and category (odds ratio = 2.66; 95% CI: 1.81 to 3.83). DISCUSSION: KIMI generated responses that were generally safe, clinically concordant, and clearly communicated. These findings support the feasibility of deploying enhanced LLMs for asynchronous patient engagement in low-to-moderate risk care coordination settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.