OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 18.05.2026, 10:44

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Performance Of ChatGPT For Automated Surgical Chart Review

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

PURPOSE: Manual chart review has become an increasingly time-consuming step in surgical outcomes research, often requiring 30-45 minutes per patient to extract demographics, comorbidities, operative details, and postoperative outcomes. Operative notes in microsurgery, are particularly challenging due to their length, complexity, and variability. Critical details, such as recipient and donor vessel selection, anastomotic technique, flap type, laterality, are often embedded within unstructured narrative text, contributing to inconsistency in manual abstraction. Recent advances in natural language processing, particularly large language models, offer a potential solution to automate and standardize clinical data extraction with speed and reliability. This study evaluates the performance of our institutions HIPAA-secure ChatGPT for automated chart review in patients undergoing robotic microvascular free flap surgery. METHODS: We evaluated our institutions HIPAA-compliant UltraVioletAI ChatGPT GPT-4o model for automated chart review of patients undergoing microvascular free flap reconstruction with the Symani Microsurgical Robot. For each surgical admission, the model was provided three note types: history and physical (H&P), operative note, and discharge summary. ChatGPT extracted structured variables including patient demographics, comorbidities, operative details (flap type, operative time, anastomoses, nerve coaptations), and postoperative outcomes (length of stay; medical, surgical, flap-related, and donor-site complications; and reoperations). Model outputs were evaluated for accuracy, completeness, handling of missing data, hallucinations, and contextual specificity. Descriptive statistics were summarized as means, and comparisons across note types were performed using one-way ANOVA. RESULTS: A total of 27 patients were analyzed, encompassing 81 distinct clinical notes. ChatGPT demonstrated an overall mean accuracy of 88.2 13.7% across all notes. For data deemed not applicable, ChatGPT responded correctly 92.3 15.9% of the time. Hallucinations, defined as fabricated or incorrect statements, occurred infrequently (3.7 6.6%), while completeness of output was nearly perfect (99.9 0.5%). When responding, ChatGPT maintained appropriate contextual specificity in 98.5 10.7% of notes. When stratified by note type (H&P, operative note, and discharge summary), there were no significant differences in accuracy, handling of non-applicable data, completeness, or contextual specificity (all p > 0.05). However, the rate of hallucinations differed significantly across note types (6.5% vs. 1.5% vs. 3.2%, p = 0.016), with H&P having the highest rate. CONCLUSION: Our institutions HIPAA-secure ChatGPT-4 model demonstrated high accuracy, specificity, and completeness in automated review of microsurgical cases, with minimal hallucinations. Large language models can serve as a reliable and efficient tool for clinical data extraction, particularly in studies involving complex microsurgical operative notes and multi-note chart reviews, thereby substantially reducing the time burden and variability inherent to manual chart review *Source: https://ps-rc.org/meeting/Program/2026/64.cgi*

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSurgical Simulation and TrainingRadiology practices and education
Volltext beim Verlag öffnen