Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study (Preprint)

2026·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

<sec> <title>BACKGROUND</title> Clinicians spent a lot of time on Electronic Health Record (EHR) documentation, often at the expense of patient interaction. Ambient listening technology uses artificial intelligence to passively record and summarize clinical encounters. While initial studies are promising, there is limited evidence on system performance in complex, non-English settings. </sec> <sec> <title>OBJECTIVE</title> To compare the documentation performance of three commercially available ambient listening systems in simulated Dutch-language outpatient consultations by assessing note completeness, correctness, and conciseness under predefined linguistic and interactional challenges. </sec> <sec> <title>METHODS</title> Standardized audio recordings of ten scripted physician–patient interactions in two specialties were used. Scenarios included multi-speaker dynamics (patient companion), conversational disruptions (nurse interruption), evasive patient communication, and a regional dialect (Gronings). Three distinct AI documentation systems (Provider A, Provider B, and Provider C) processed the audio files. Eight human raters evaluated the resulting AI-generated notes against reference summaries for Completeness, Conciseness, and Correctness using a 5-point ordinal scale. Inter-rater agreement was assessed using Gwet’s AC2. System-level technical characteristics were assessed alongside clinical performance to aid interpretation of between-vendor differences. </sec> <sec> <title>RESULTS</title> Across 351 ratings on a 1-5 scale, the overall inter-rater agreement was high (Gwet’s AC2 = 0.827). Mean scores were tightly clustered across providers (Provider C: 4.26, Provider B: 4.00, Provider A: 3.82). Mean scores were higher in Otolaryngology (mean 4.36) than Surgical Oncology (mean 3.68). Across scoring domains, correctness received the highest mean score (4.21), while completeness received lowest (3.81). Variation in mean scores was observed across script scenarios. Dialect-specific scenarios showed the lowest mean score (3.77) and the greatest variability across providers. Median summary generation times ranged from 13.5 seconds (Provider C) to 22.0 seconds (Provider B). </sec> <sec> <title>CONCLUSIONS</title> Ambient listening systems demonstrate good performance in Dutch clinical settings, even under conditions simulating common conversational challenges. While accuracy is generally high, performance is sensitive to linguistic variation. Future deployment studies must prioritize linguistic equity, real-world validation of efficiency gains, and evaluation of both clinician and patient perception to understand how these systems influence consultation dynamics and care delivery across diverse patient populations. </sec>

Autoren

Themen

Simulation-Based Education in HealthcareElectronic Health Records SystemsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen