Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Real‐world deployment and evaluation of <scp>PEri</scp> ‐operative <scp>AI CHatbot</scp> ( <scp>PEACH</scp> ): a large language model chatbot for peri‐operative medicine
2
Zitationen
13
Autoren
2025
Jahr
Abstract
INTRODUCTION: Large Language Models are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of PEri-operative AI CHatbot (PEACH). It was developed by embedding 35 institutional peri-operative protocols into a secure large language model environment, with iterative prompt engineering and internal testing to ensure clinical relevance and accuracy. METHODS: The system was tested with a silent deployment using real-world data. Accuracy, safety and usability were assessed. Accuracy was evaluated by comparing the responses from PEACH against institutional guidelines and expert consensus. Deviations and hallucinations were categorised based on potential harm, and user feedback was evaluated using the Davis' Technology Acceptance Model. Updates to PEACH were made after the initial silent deployment to make minor amendments to one of the protocols. RESULTS: In total, 240 real-world clinical iterations were evaluated. First-generation accuracy was 97.5% (78/80), with an overall accuracy of 96.7% (232/240) across three iterations. In the updated PEACH, accuracy improved to 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018). Hallucinations and deviations were minimal (1/240 and 2/240, respectively). There was high usability, with clinicians noting that PEACH expedited decisions in 95% of cases. The κ statistic for inter-rater reliability for PEACH was 0.772 and 0.893 between three iterations, compared with 0.610 and 0.784 for experienced peri-operative physicians. DISCUSSION: PEACH is an accurate, adaptable tool that enhances consistency and efficiency in peri-operative decision-making. Future research should explore scalability across specialties and its impact on clinical outcomes.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.