OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.05.2026, 01:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Visual-language foundation modelsandartificial intelligence agents in healthcare: Bridging from technological innovation to clinical impact

2026·0 Zitationen·Computational Visual MediaOpen Access
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2026

Jahr

Abstract

Recent advances in foundation models (FMs) have enabled artificial intelligence systems to acquire generalizable capabilities, with language and visual models demonstrating adaptability across diverse healthcare applications and multimodal vision-language FMs (VLFMs) supporting complex tasks such as report generation and question answering. Building on these developments, autonomous agents, particularly vision language agents (VLAs), represent the next frontier, extending AI from perception and recognition to cognition, decision-making, and action. By integrating multimodal understanding with planning, interaction, and tool use, VLAs introduce autonomous intelligence capable of managing multi-step clinical workflows, adapting over time, and collaborating with clinicians. This survey provides a comprehensive perspective on VLAs in healthcare. We examine their current progress, analyze the major challenges hindering their integration, and highlight promising directions for future development. Unlike prior surveys that primarily focus on diverse FMs, our work prioritizes emerging VLFMs and incorporates agentic concepts, emphasizing their transformative potential beyond narrow research prototypes. Specifically, we (i) evaluate VLAs in terms of VLFM architectures and adaptation strategies and agentic design, (ii) assess their potential impact across diverse clinical workflows, and (iii) discuss pathways towards next-generation clinical decision support, considering technical and clinical challenges.

Ähnliche Arbeiten