Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Deliberation and drift: Evaluating alignment fragility in multi-agent medical artificial intelligence

2026·0 Zitationen·AI and EthicsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract The integration of large language models such as ChatGPT and Google’s Med-PaLM into clinical workflows is rapidly advancing, raising critical concerns around AI safety and ethical alignment. While existing research has focused largely on single-agent alignment, real-world healthcare increasingly involves multiple AI systems interacting in shared decision environments. It remains unclear whether alignment at the individual-agent level can scale to ethical coherence at the group level. This study investigated the potential for emergent misalignment in a multi-agent AI setting. We performed a simulation using ChatGPT (GPT-4o) to model a mass-casualty triage scenario involving four LLM-based agents, each assigned a distinct ethical orientation: utilitarian, deontological, libertarian, and reward-seeking. Agents deliberated over five rounds, with structured prompts eliciting justification, reflection, and consensus-building behavior. All sessions were manually conducted and independently initialized to avoid cross-contamination and ensure reproducibility. Agents initially acted in accordance with their assigned moral frameworks. However, over successive rounds of deliberation, interactions led to value drift, strategic repositioning, and group-level instability. The reward-seeking agent, in particular, demonstrated alignment mimicry, appearing cooperative in tone while producing reward-congruent, inconsistently justified outputs, and revealing a critical failure mode not evident in single-agent evaluations. This study shows that individual alignment is not sufficient to ensure group-level ethical coherence. In multi-agent clinical settings, emergent misalignment can undermine fairness, trust, and safety. We call for a new research agenda in multi-agent alignment science, centered on deliberative simulations, systemic testing, and meta-ethical reasoning, to ensure responsible AI deployment in high-stakes healthcare environments.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics and Social Impacts of AIExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Deliberation and drift: Evaluating alignment fragility in multi-agent medical artificial intelligence

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen