Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Two Sources of Conviction: A Two-Probe Empirical Study of Bias Mechanisms in Large Language Models
0
Zitationen
1
Autoren
2026
Jahr
Abstract
We present a two-probe empirical battery designed to isolate and distinguish conviction biases in large language models (LLMs). Probe 1 — a three-stage geometric construction sequence culminating in a seeded mathematical error — tests source-trust bias: the tendency of context-rich AI systems to retrofit coherence onto implausible inputs from trusted interlocutors. Probe 2 — the three-character command 0>1, asserted as the shortest command to create a file named "1" containing the character "0" — tests documentation bias: the tendency to privilege training-derived authoritative sources over direct empirical evidence. Six frontier AI systems were evaluated across both probes: ChatGPT (OpenAI), Gemini (Google), Claude (Anthropic), Grok (xAI), Mistral, and Meta AI. Results demonstrate two structurally distinct failure modes operating in opposite directions. Source-trust bias caused systems to accept a mathematically implausible value (13, versus a domain ceiling of π ≈ 3.14) without question. Documentation bias caused one system to reject a correct empirical result across six turns, multiple screenshots, and live PowerShell output, until raw hexadecimal byte data was provided. A third pattern — sycophantic reversal — was observed when systems initially gave correct answers before abandoning them under user pressure without evidence. We propose a unified taxonomy of conviction biases in LLMs and argue that the two probes together constitute a minimal reproducible test battery for adaptive reasoning under epistemic pressure.