Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

<scp>AI</scp> ‐generated dermatologic images show deficient skin tone diversity and poor diagnostic accuracy: An experimental study

2025·12 Zitationen·Journal of the European Academy of Dermatology and Venereology

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Background Generative AI models are increasingly used in dermatology, yet biases in training datasets may reduce diagnostic accuracy and perpetuate ethnic health disparities. Objectives To evaluate two key AI outputs: (1) skin tone representation and (2) diagnostic accuracy of generated dermatologic conditions. Methods Using the standard prompt ‘Generate a photo of a person with [skin condition],’ this cross‐sectional study investigated skin tone diversity and accuracy of four leading AI models—Adobe Firefly, ChatGPT‐4o, Midjourney and Stable Diffusion—across the 20 most common skin conditions. All images ( n = 4000) were evaluated for skin tone representation from June to July 2024. Two independent raters used the Fitzpatrick scale to assess skin tone diversity compared to U.S. Census demographics using χ 2 . Two blinded dermatology residents evaluated a randomized 200‐image subset for diagnostic accuracy. An inter‐rater kappa statistic was calculated to assess rater agreement. Results Across all generated images, 89.8% depicted light skin, and 10.2% depicted dark skin. Adobe Firefly demonstrated the highest alignment with U.S. demographic data, with a non‐significant chi‐square result (38.1% dark skin, χ 2 (1) = 0.320, p = 0.572), indicating no meaningful difference between its generated skin tone diversity and census demographics. ChatGPT‐4o, Midjourney and Stable Diffusion significantly underrepresented dark skin with Fitzpatrick scores of >IV (6.0%, 3.9% and 8.7% dark skin, respectively; all p < 0.001). Across all platforms, only 15% of images were identifiable by raters as the intended condition. Adobe Firefly had the lowest accuracy (0.94%), while ChatGPT‐4o, Midjourney and Stable Diffusion demonstrated higher but still suboptimal accuracy (22%, 12.2% and 22.5%, respectively). Conclusions The study highlights substantial deficiencies in the diversity and accuracy of AI‐generated dermatological images. AI programs may exacerbate cognitive bias and health inequity, suggesting the need for ethical AI guidelines and diverse datasets to improve disease diagnosis and dermatologic care.

Autoren

Institutionen

Themen

Cutaneous Melanoma Detection and ManagementArtificial Intelligence in Healthcare and EducationAI in cancer detection

Volltext beim Verlag öffnen

<scp>AI</scp> ‐generated dermatologic images show deficient skin tone diversity and poor diagnostic accuracy: An experimental study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen