Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Red teaming ChatGPT in medicine to yield real-world insights on model behavior
23
Zitationen
80
Autoren
2025
Jahr
Abstract
Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.
Autoren
- Crystal Chang
- Hodan Farah
- Haiwen Gui
- Shawheen J. Rezaei
- Charbel Bou-Khalil
- Ye-Jean Park
- Akshay Swaminathan
- Jesutofunmi A. Omiye
- Akaash Kolluri
- Akash Chaurasia
- Alejandro Lozano
- Alice Heiman
- Allison Sihan Jia
- Amit Kaushal
- Angela Y. Jia
- Angelica Iacovelli
- Archer Y. Yang
- Arghavan Salles
- Arpita Singhal
- Balasubramanian Narasimhan
- Benjamin Belai
- Benjamin H. Jacobson
- Binglan Li
- Celeste H. Poe
- Chandan Sanghera
- Chenming Zheng
- Conor Messer
- Damien Varid Kettud
- Deven Pandya
- Dhamanpreet Kaur
- Diana Hla
- Diba Dindoust
- Dominik Moehrle
- Ross Duncan
- Ellaine Chou
- Eric Lin
- Fateme Nateghi Haredasht
- Cheng Ge
- Irena Gao
- Jacob Chang
- Jake Silberg
- Jason Fries
- Jiapeng Xu
- J. Weston Jamison
- John Tamaresis
- Jonathan H. Chen
- Joshua Lazaro
- Juan M. Banda
- Julie Lee
- Karen Ebert Matthys
- Kirsten R. Steffner
- Lü Tian
- Luca Pegolotti
- Malathi Srinivasan
- Maniragav Manimaran
- Matthew Schwede
- Minghe Zhang
- Minh Hoai Nguyen
- Mohsen Fathzadeh
- Qian Zhao
- Rika Bajra
- Rohit Khurana
- Ruhana Azam
- R. W. Bartlett
- Sang Truong
- Scott L. Fleming
- S. Varadha Raj
- Solveig Behr
- Sonia Onyeka
- Sri Muppidi
- Tarek Bandali
- Tiffany Eulalio
- Wenyuan Chen
- Xuanyu Zhou
- Yanan Ding
- Ying Cui
- Yuqi Tan
- Yutong Liu
- Nigam H. Shah
- Roxana Daneshjou