Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Understanding Surgical Complications in Clinical Text: Automated Clavien–Dindo Grading using a Zero-Shot Large Language Model Approach in a Collective of Liver Surgery Patients (Preprint)
0
Zitationen
14
Autoren
2026
Jahr
Abstract
<sec> <title>BACKGROUND</title> The standardized extraction of postoperative complications from unstructured routine clinical documentation remains a major unresolved challenge in digital surgery and health informatics. Although the Clavien–Dindo classification is the established standard for grading postoperative complications, its application in routine clinical documentation is largely implicit and unstructured. </sec> <sec> <title>OBJECTIVE</title> To assess the capability of open-weight and proprietary large language models (LLMs) to classify postoperative complications according to the Clavien–Dindo system using discharge letters, benchmarked against expert consensus annotations. </sec> <sec> <title>METHODS</title> We analyzed discharge letters from 650 surgical cases of patients who underwent hepatobiliary surgery between 2010 and 2024. The cohort included Grade I–II complications in 24%, Grade III–IV in 19%, and Grade V (death) in 6% of patients. Representative open-weight (Qwen 3, Llama 3.3, Ministral 3, GPT-OSS) and proprietary (GPT 5.1, Gemini 3 Pro) LLMs were prompted to infer complication grades directly from the discharge letters. Model performance was evaluated against expert assessment using accuracy and a detailed deviation analysis. </sec> <sec> <title>RESULTS</title> All models were capable of identifying and classifying complications from the unstructured documentation in the discharge letters. On the full 650-case dataset, open-weight models achieved accuracies up to 0.775 for fine-grading prediction and 0.94 for binary classification. On a balanced 50-case subset, proprietary models achieved the highest performance, with accuracies of 0.78 for fine-grading and 0.98 for binary classification. In contrast, open-weight models reached accuracies up to 0.70 in fine-grading and 0.94 for binary grading, with lower computational requirements. An ensemble approach yielded additional gains in classification performance. </sec> <sec> <title>CONCLUSIONS</title> LLMs can accurately classify postoperative complications from discharge letters, enabling scalable and objective monitoring of surgical outcomes. Their use may reduce manual abstraction workload and promote consistent, data-driven quality assessment in surgical care. </sec>
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.384 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.259 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.688 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.434 Zit.