Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
<scp>AI</scp> ‐generated dermatologic images show deficient skin tone diversity and poor diagnostic accuracy: An experimental study
12
Zitationen
7
Autoren
2025
Jahr
Abstract
Abstract Background Generative AI models are increasingly used in dermatology, yet biases in training datasets may reduce diagnostic accuracy and perpetuate ethnic health disparities. Objectives To evaluate two key AI outputs: (1) skin tone representation and (2) diagnostic accuracy of generated dermatologic conditions. Methods Using the standard prompt ‘Generate a photo of a person with [skin condition],’ this cross‐sectional study investigated skin tone diversity and accuracy of four leading AI models—Adobe Firefly, ChatGPT‐4o, Midjourney and Stable Diffusion—across the 20 most common skin conditions. All images ( n = 4000) were evaluated for skin tone representation from June to July 2024. Two independent raters used the Fitzpatrick scale to assess skin tone diversity compared to U.S. Census demographics using χ 2 . Two blinded dermatology residents evaluated a randomized 200‐image subset for diagnostic accuracy. An inter‐rater kappa statistic was calculated to assess rater agreement. Results Across all generated images, 89.8% depicted light skin, and 10.2% depicted dark skin. Adobe Firefly demonstrated the highest alignment with U.S. demographic data, with a non‐significant chi‐square result (38.1% dark skin, χ 2 (1) = 0.320, p = 0.572), indicating no meaningful difference between its generated skin tone diversity and census demographics. ChatGPT‐4o, Midjourney and Stable Diffusion significantly underrepresented dark skin with Fitzpatrick scores of >IV (6.0%, 3.9% and 8.7% dark skin, respectively; all p < 0.001). Across all platforms, only 15% of images were identifiable by raters as the intended condition. Adobe Firefly had the lowest accuracy (0.94%), while ChatGPT‐4o, Midjourney and Stable Diffusion demonstrated higher but still suboptimal accuracy (22%, 12.2% and 22.5%, respectively). Conclusions The study highlights substantial deficiencies in the diversity and accuracy of AI‐generated dermatological images. AI programs may exacerbate cognitive bias and health inequity, suggesting the need for ethical AI guidelines and diverse datasets to improve disease diagnosis and dermatologic care.
Ähnliche Arbeiten
Dermatologist-level classification of skin cancer with deep neural networks
2017 · 13.548 Zit.
Tumor Angiogenesis: Therapeutic Implications
1971 · 10.118 Zit.
Improved Survival with Vemurafenib in Melanoma with BRAF V600E Mutation
2011 · 7.678 Zit.
Pembrolizumab versus Ipilimumab in Advanced Melanoma
2015 · 5.819 Zit.
Overall Survival with Combined Nivolumab and Ipilimumab in Advanced Melanoma
2017 · 5.368 Zit.