OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 31.03.2026, 04:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Deliberation and drift: Evaluating alignment fragility in multi-agent medical artificial intelligence

2026·0 Zitationen·AI and EthicsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2026

Jahr

Abstract

Abstract The integration of large language models such as ChatGPT and Google’s Med-PaLM into clinical workflows is rapidly advancing, raising critical concerns around AI safety and ethical alignment. While existing research has focused largely on single-agent alignment, real-world healthcare increasingly involves multiple AI systems interacting in shared decision environments. It remains unclear whether alignment at the individual-agent level can scale to ethical coherence at the group level. This study investigated the potential for emergent misalignment in a multi-agent AI setting. We performed a simulation using ChatGPT (GPT-4o) to model a mass-casualty triage scenario involving four LLM-based agents, each assigned a distinct ethical orientation: utilitarian, deontological, libertarian, and reward-seeking. Agents deliberated over five rounds, with structured prompts eliciting justification, reflection, and consensus-building behavior. All sessions were manually conducted and independently initialized to avoid cross-contamination and ensure reproducibility. Agents initially acted in accordance with their assigned moral frameworks. However, over successive rounds of deliberation, interactions led to value drift, strategic repositioning, and group-level instability. The reward-seeking agent, in particular, demonstrated alignment mimicry, appearing cooperative in tone while producing reward-congruent, inconsistently justified outputs, and revealing a critical failure mode not evident in single-agent evaluations. This study shows that individual alignment is not sufficient to ensure group-level ethical coherence. In multi-agent clinical settings, emergent misalignment can undermine fairness, trust, and safety. We call for a new research agenda in multi-agent alignment science, centered on deliberative simulations, systemic testing, and meta-ethical reasoning, to ensure responsible AI deployment in high-stakes healthcare environments.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics and Social Impacts of AIExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen