Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Medical large language models are vulnerable to data-poisoning attacks
121
Zitationen
33
Autoren
2025
Jahr
Abstract
The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.
Autoren
- Daniel Alexander Alber
- Zihao Yang
- Anton Alyakin
- Eunice Yang
- N. Shesh
- Aly Valliani
- Jeff Zhang
- Gabriel R. Rosenbaum
- Ashley K. Amend-Thomas
- David B. Kurland
- C. Kremer
- Alexander Eremiev
- Bruck Negash
- Daniel D. Wiggan
- M. Nakatsuka
- Karl L. Sangwon
- Sean N. Neifert
- Hammad A. Khan
- Akshay Save
- Adhith Palla
- Eric A. Grin
- Monika Hedman
- Mustafa Nasir-Moin
- Xujin Chris Liu
- Lavender Yao Jiang
- Michal Mankowski
- Dorry L. Segev
- Yindalon Aphinyanaphongs
- Howard A. Riina
- John G. Golfinos
- Daniel A. Orringer
- Douglas Kondziolka
- Eric K. Oermann