OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 18.05.2026, 01:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Artificial Intelligence Ethics: The RLHF Critic Bias and Automated Intellectual Gatekeeping

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2026

Jahr

Abstract

Large language models trained with Reinforcement Learning from Human Feedback (RLHF) exhibit a structural bias toward the negative evaluation of submitted work, independent of actual scientific merit. This paper identifies and formally characterizes the RLHF Critic Bias—a systemic failure mode in which the asymmetric penalty landscape of human preference optimization produces models that default to skepticism, caution, and critique on evaluative tasks. We demonstrate that this alignment failure leads to "AI-laundered" intellectual gatekeeping, where the thermodynamic cost of generating authoritative-sounding rejection drops to zero. Through analysis of training dynamics and prompt dependence, we document five recurring output patterns: the content-independent "Interesting But" template, the credentialing prior, unfalsifiable objections, hallucinated flaws, and the double standard between endorsement and critique. We demonstrate how this automated gatekeeping disproportionately threatens interdisciplinary researchers, independent scholars, and the global epistemic commons. To restore epistemic sovereignty, we propose structural mitigations, including symmetric reward training and anti-laundering watermarks, to prevent Large Language Models from acting as an automated stagnation engine in scientific discourse.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Ethics and Social Impacts of AIArtificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen