Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Visual-language foundation modelsandartificial intelligence agents in healthcare: Bridging from technological innovation to clinical impact
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Recent advances in foundation models (FMs) have enabled artificial intelligence systems to acquire generalizable capabilities, with language and visual models demonstrating adaptability across diverse healthcare applications and multimodal vision-language FMs (VLFMs) supporting complex tasks such as report generation and question answering. Building on these developments, autonomous agents, particularly vision language agents (VLAs), represent the next frontier, extending AI from perception and recognition to cognition, decision-making, and action. By integrating multimodal understanding with planning, interaction, and tool use, VLAs introduce autonomous intelligence capable of managing multi-step clinical workflows, adapting over time, and collaborating with clinicians. This survey provides a comprehensive perspective on VLAs in healthcare. We examine their current progress, analyze the major challenges hindering their integration, and highlight promising directions for future development. Unlike prior surveys that primarily focus on diverse FMs, our work prioritizes emerging VLFMs and incorporates agentic concepts, emphasizing their transformative potential beyond narrow research prototypes. Specifically, we (i) evaluate VLAs in terms of VLFM architectures and adaptation strategies and agentic design, (ii) assess their potential impact across diverse clinical workflows, and (iii) discuss pathways towards next-generation clinical decision support, considering technical and clinical challenges.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.758 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.666 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.220 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.896 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.