Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Performance Of ChatGPT For Automated Surgical Chart Review
0
Zitationen
8
Autoren
2026
Jahr
Abstract
PURPOSE: Manual chart review has become an increasingly time-consuming step in surgical outcomes research, often requiring 30-45 minutes per patient to extract demographics, comorbidities, operative details, and postoperative outcomes. Operative notes in microsurgery, are particularly challenging due to their length, complexity, and variability. Critical details, such as recipient and donor vessel selection, anastomotic technique, flap type, laterality, are often embedded within unstructured narrative text, contributing to inconsistency in manual abstraction. Recent advances in natural language processing, particularly large language models, offer a potential solution to automate and standardize clinical data extraction with speed and reliability. This study evaluates the performance of our institutions HIPAA-secure ChatGPT for automated chart review in patients undergoing robotic microvascular free flap surgery. METHODS: We evaluated our institutions HIPAA-compliant UltraVioletAI ChatGPT GPT-4o model for automated chart review of patients undergoing microvascular free flap reconstruction with the Symani Microsurgical Robot. For each surgical admission, the model was provided three note types: history and physical (H&P), operative note, and discharge summary. ChatGPT extracted structured variables including patient demographics, comorbidities, operative details (flap type, operative time, anastomoses, nerve coaptations), and postoperative outcomes (length of stay; medical, surgical, flap-related, and donor-site complications; and reoperations). Model outputs were evaluated for accuracy, completeness, handling of missing data, hallucinations, and contextual specificity. Descriptive statistics were summarized as means, and comparisons across note types were performed using one-way ANOVA. RESULTS: A total of 27 patients were analyzed, encompassing 81 distinct clinical notes. ChatGPT demonstrated an overall mean accuracy of 88.2 13.7% across all notes. For data deemed not applicable, ChatGPT responded correctly 92.3 15.9% of the time. Hallucinations, defined as fabricated or incorrect statements, occurred infrequently (3.7 6.6%), while completeness of output was nearly perfect (99.9 0.5%). When responding, ChatGPT maintained appropriate contextual specificity in 98.5 10.7% of notes. When stratified by note type (H&P, operative note, and discharge summary), there were no significant differences in accuracy, handling of non-applicable data, completeness, or contextual specificity (all p > 0.05). However, the rate of hallucinations differed significantly across note types (6.5% vs. 1.5% vs. 3.2%, p = 0.016), with H&P having the highest rate. CONCLUSION: Our institutions HIPAA-secure ChatGPT-4 model demonstrated high accuracy, specificity, and completeness in automated review of microsurgical cases, with minimal hallucinations. Large language models can serve as a reliable and efficient tool for clinical data extraction, particularly in studies involving complex microsurgical operative notes and multi-note chart reviews, thereby substantially reducing the time burden and variability inherent to manual chart review *Source: https://ps-rc.org/meeting/Program/2026/64.cgi*
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.