Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the quality of AI-generated and physician-written discharge summaries: evaluation of an EHR-integrated tool in a Dutch academic hospital

2026·0 Zitationen·EBioMedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Large language models (LLMs) offer potential to reduce administrative burden in clinical care by generating discharge summaries. Most prior evaluations have been limited to drafts, small cohorts, or non-integrated settings. Robust validation of fully automated, EHR-integrated systems in real-world practice is lacking. METHODS: This study was conducted in April 2025 at a Dutch academic hospital. A total of 292 paired discharge summaries from multiple departments were evaluated, each consisting of a physician-written and an LLM-generated version. Summaries were independently assessed by eight blinded clinicians using a 5-point Likert scale across completeness, correctness, and conciseness. Trustworthiness was also scored. Domain and total scores were compared with Wilcoxon signed-rank tests, and interrater reliability was quantified using Gwet's AC2. FINDINGS: LLM-generated summaries had lower completeness (4.50 (4.00-5.00) vs 5.00 (4.50-5.00); p < 0.001), similar correctness (5.00 (4.50-5.00) vs 5.00 (4.63-5.00); p = 0.14), and greater conciseness (5.00 (4.50-5.00) vs 4.50 (4.00-5.00); p < 0.001) compared with physician-written summaries. Total scores did not differ (14.00 (13.00-15.00) vs 14.00 (13.00-15.00); p = 0.34). Physician-written summaries were trusted by both reviewers in 279 (95.5%) cases, whereas LLM-generated summaries were trusted in 249 (85.3%) cases, partially trusted in 34 (11.6%), and rejected in 9 (3.1%). Interrater agreement for total scores was high (AC2 0.87, 95% CI 0.83-0.90 for LLM; 0.85, 95% CI 0.81-0.89 for physician). INTERPRETATION: Discharge summaries generated by an EHR-integrated LLM achieved quality ratings comparable to physician-written documents across multiple specialties, with no difference in total scores. Unlike earlier pilot work, this study demonstrates real-world feasibility of automated LLM use in clinical workflows at scale. With appropriate oversight and specialty-specific refinement, such systems could substantially reduce documentation burden while maintaining discharge summary quality. FUNDING: This research did not receive a specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationElectronic Health Records SystemsMachine Learning in Healthcare

Volltext beim Verlag öffnen

Assessing the quality of AI-generated and physician-written discharge summaries: evaluation of an EHR-integrated tool in a Dutch academic hospital

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen