OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 18.05.2026, 11:39

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Clinical Plausibility in Large Language Model Robustness Testing for Medicine: A Scoping Review

2026·0 Zitationen·Journal of Medical SystemsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2026

Jahr

Abstract

Large language models (LLMs) show promise in medical applications, yet their translation into clinical practice requires rigorous validation. Current robustness testing often employs adversarial approaches borrowed from AI safety, raising questions about their alignment with authentic clinical scenarios. To systematically map methodologies used for robustness testing of LLMs in medical contexts and assess their clinical plausibility. A scoping review was conducted following PRISMA-ScR guidelines, searching PubMed, Embase, Web of Science, IEEE Xplore, ACM Digital Library, arXiv, and MedRxiv from January 2023 to September 2025. Two independent physician reviewers screened 5,331 articles, extracting data on testing methodologies, medical domains, expert involvement, and clinical plausibility. Thirty-three studies met inclusion criteria, predominantly from 2025 (82%). The most common robustness testing approaches were misleading prompts (49%) and adversarial prompts (39%). Only 33% of studies designed tests clearly mimicking plausible clinical scenarios. While 58% reported expert involvement, the depth of integration varied considerably. Studies predominantly addressed mixed medical domains (73%) rather than specialized fields. The emerging literature suggests that LLM robustness testing in medicine often emphasizes technical vulnerability detection, with fewer studies examining clinically plausible scenarios of routine use. Future frameworks should complement adversarial testing with clinically grounded, longitudinal, and specialty focused evaluations to support deployment-relevant inference.

Ähnliche Arbeiten