Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Association Between Voluntary Engagement with LLM-Powered Virtual Standardized Patients and Medical Students’ Clinical Interview Performance: Retrospective Observational Study (Preprint)
0
Zitationen
7
Autoren
2026
Jahr
Abstract
<sec> <title>BACKGROUND</title> Virtual Standardized Patients (VSPs) powered by large language models (LLMs) are emerging as scalable tools to supplement human Standardized Patient (SP) training in medical education. However, their added value in settings with established SP programs remains underexplored. </sec> <sec> <title>OBJECTIVE</title> To assess whether voluntary engagement with an LLM-powered VSP is associated with medical students’ SP interview scores within established in-person SP programs. </sec> <sec> <title>METHODS</title> We analyzed VSP usage logs and SP assessment data from fourth-year undergraduate medical students enrolled in a 7-week diagnostics course with weekly in-person SP training sessions. Students had voluntary access to a LLM-powered VSP system. VSP usage was quantified by the number of valid sessions and the total number of question–answer (QA) pairs. The primary outcome was the composite SP interview score (0–100). Spearman and Pearson correlations, linear regression, and independent t tests were conducted. </sec> <sec> <title>RESULTS</title> A total of 92 students generated 359 valid VSP dialogues comprising 19,380 QA pairs; the overall hallucination rate was 0.34%. Median VSP usage was 2 sessions (IQR 1-7, range 0-18). Students produced a median of 132 QA pairs (IQR 38-323; range 0-916). The mean SP interview score was 92.8 ± 2.3. VSP session frequency showed a weak but statistically significant positive correlation with SP scores (Spearman ρ = 0.25, P = .016). Linear regression indicated that each additional session was associated with a 0.15 point increase in SP score (β = 0.15; 95% CI: 0.040 - 0.26; P = .011; R² = 0.070). A threshold of 7 sessions distinguished a high-use group (≥ 7 sessions, n = 22) that achieved higher scores than the low-use group (< 7 sessions, n = 70) (94.1 ± 2.0 vs. 92.5 ± 2.4; P = .013). Total QA pairs showed a moderate positive correlation with SP scores (Spearman ρ = 0.28, P = .006). Regression revealed that each additional QA pair was associated with a 0.0033 point increase in SP score (β = 0.0033; 95% CI: 0.0011 - 0.0055; P = .002; R² = 0.106). Students in the high-QA group (≥ 132 pairs, n = 46) scored higher than those in the low-QA group (< 132 pairs, n = 46) (93.3 ± 2.3 vs. 92.3 ± 2.3; P = .030). </sec> <sec> <title>CONCLUSIONS</title> Within a human SP curriculum, autonomous engagement with an LLM-powered VSP was associated with small but meaningful improvements in clinical interview performance. VSPs appear to function as scalable “cognitive replay” environments that provide low-stakes, feedback-rich practice. Educational strategies should move beyond counting VSP exposures toward fostering deeper, reflective engagement with AI-based simulations. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.626 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.532 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.046 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.843 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.