Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Limitations of Artificial Intelligence Generated Images for Hand Surgery Patient Education
2
Zitationen
5
Autoren
2025
Jahr
Abstract
Purpose: The role of artificial intelligence (AI) in medicine is rapidly evolving, with potential to improve both the clinician and patient experience. We sought to evaluate whether popular AI text-to-image generators could create anatomically accurate images of common hand surgery procedures. We hypothesized that the AI-generated images would not be adequate as patient education materials. Methods: We queried five AI text-to-image generators: Craiyon, DALL-E, DeepSeek, Gemini, Midjourney, and Stable Diffusion. They were given the prompt, "Create an anatomically accurate image with labels of [Condition] surgical approach to be used as a visual aid for patient education," with the following conditions inserted: carpal tunnel syndrome, Dupuytren contracture, trigger finger, thumb carpometacarpal arthritis, and de Quervain tenosynovitis. Images were then graded on legibility, detail and clarity, anatomical realism and accuracy, appropriate surgical site, and lack of fabricated anatomy. Images could score a maximum of 2 points per each criterion, with an assumed Control score of 10 points. Results: A total of 1,500 images were generated and reviewed. When comparing total scores, all AI generators performed significantly lower than the Control, except for DALL-E's images of Dupuytren contracture. For the image detail and clarity category, DALL-E, DeepSeek, Gemini, and Midjourney all scored similarly to the Control and each other. For the remaining criteria (legibility, anatomic realism, surgical site, fabricated anatomy), each of the AI generators scored significantly lower than the Control generator. In total, 99.8% of images contained at least some degree of fabricated anatomy. DALL-E consistently had the highest scores for each category, while Craiyon had the lowest. Conclusions: Although the AI servers successfully produced highly detailed and visually engaging images, they failed to portray accurate anatomy and often included fictitious structures. Further work is needed to train and fine tune AI models to produce accurate and appropriate images. Type of study/level of evidence: Therapeutic V.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.