Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Exposing the fragility of LLM reasoning through bias-inducing prompts: evidence from BiasMedQA
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Objectives To evaluate the impact of large language model (LLM) reasoning on model susceptibility to cognitive bias-inducing prompts. Methods and analysis The performance of Llama-3.3-70B, Qwen3-32B and Gemini-2.5-Flash, along with their reasoning-enhanced variants, was evaluated in the public BiasMedQA dataset developed to evaluate seven established cognitive biases in 1273 clinical case vignettes. Each model was tested using a base prompt, a debiasing prompt with the instruction to actively mitigate cognitive bias and a few-shot prompt with additional sample cases of biased responses. Beyond the seven biases from BiasMedQA, Gemini-2.5-Flash was additionally tested using four unpublished bias-inducing prompts to unveil signs of potential data contamination and actively investigate brittleness. For each model pair, two mixed-effects logistic regression models were fitted to determine the impact of biases and mitigation strategies on performance. Results In all three models, the reasoning-enhanced variant achieved higher rates of correct responses (Llama-3.3-70B: 72.5–82.1% vs 61.0–73.4%, Qwen3-32B: 71.7–78.7% vs 55.5–64.1%, Gemini-2.5-Flash: 81.8–88.6% vs 80.0–83.7%). The performance of Gemini-2.5-Flash dropped considerably when exposing it to four additional unpublished bias-inducing prompts (from 80.0–88.6% to 47.4–86.1%), hinting at potential contamination of its training data and exposing underlying brittleness. In Llama-3.3-70B and Gemini-2.5-Flash, reasoning amplified model vulnerability to several bias-inducing prompts, while reasoning reduced susceptibility of Qwen3-32B to one of the seven biases. The debiasing and few-shot prompting approaches demonstrated statistically significant reductions in biased responses across all three model architectures. Conclusion In none of the three LLMs, reasoning was able to consistently reduce vulnerability to bias-inducing prompts, revealing the fragility of the reasoning capabilities purported by the model developers.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.349 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.219 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.631 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.480 Zit.