Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Deliberation and drift: Evaluating alignment fragility in multi-agent medical artificial intelligence
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Abstract The integration of large language models such as ChatGPT and Google’s Med-PaLM into clinical workflows is rapidly advancing, raising critical concerns around AI safety and ethical alignment. While existing research has focused largely on single-agent alignment, real-world healthcare increasingly involves multiple AI systems interacting in shared decision environments. It remains unclear whether alignment at the individual-agent level can scale to ethical coherence at the group level. This study investigated the potential for emergent misalignment in a multi-agent AI setting. We performed a simulation using ChatGPT (GPT-4o) to model a mass-casualty triage scenario involving four LLM-based agents, each assigned a distinct ethical orientation: utilitarian, deontological, libertarian, and reward-seeking. Agents deliberated over five rounds, with structured prompts eliciting justification, reflection, and consensus-building behavior. All sessions were manually conducted and independently initialized to avoid cross-contamination and ensure reproducibility. Agents initially acted in accordance with their assigned moral frameworks. However, over successive rounds of deliberation, interactions led to value drift, strategic repositioning, and group-level instability. The reward-seeking agent, in particular, demonstrated alignment mimicry, appearing cooperative in tone while producing reward-congruent, inconsistently justified outputs, and revealing a critical failure mode not evident in single-agent evaluations. This study shows that individual alignment is not sufficient to ensure group-level ethical coherence. In multi-agent clinical settings, emergent misalignment can undermine fairness, trust, and safety. We call for a new research agenda in multi-agent alignment science, centered on deliberative simulations, systemic testing, and meta-ethical reasoning, to ensure responsible AI deployment in high-stakes healthcare environments.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.