Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Independent bone-level diagnostic accuracy study of an AI tool for detecting appendicular skeletal fractures on radiographs
0
Zitationen
7
Autoren
2026
Jahr
Abstract
OBJECTIVES: To perform an in-depth evaluation of the diagnostic test accuracy of a commercially available AI tool for assistance in fracture detection on radiographs. MATERIALS AND METHODS: This retrospective study included consecutive patients with trauma radiographs at seven Danish hospitals. The AI output was evaluated using the clinical radiologic report as a reference standard for a binary fracture outcome. The report is based on assessments by an emergency physician, a senior orthopedic surgeon, and a radiology expert. Sensitivity, specificity, positive- and negative predictive values were calculated. Sensitivity and specificity were additionally stratified for children, degenerative disease, metal, old fractures, casting, obvious fractures, and inter-hospital differences. Bone-wise sensitivity and specificity were assessed for multiple fracture cases and individual bones. RESULTS: The study sample consisted of 2783 patients (median age 38 years, IQR, 21, 64, 1443 female), and 948 (34%) had the target finding. The AI tool demonstrated an overall sensitivity of 89% (95% CI: 87%-91%) and specificity of 88% (95% CI: 86%-89%). The specificity was 57% (95% CI: 49%-65%) in examinations with old fractures. Bone-wise sensitivity for carpal fractures ranged from other carpals 25% (95% CI: 1%-81%] to triquetrum 75% (95% CI: 43%-95%). Tarsal fractures ranged from medial cuneiform 0% (95% CI: 0%-60%) to talus 53% (95% CI: 27%-79%). CONCLUSION: The AI tool demonstrated high overall diagnostic accuracy and performed robustly across most specific situations. However, specificity was substantially reduced in the presence of old fractures. The bone-wise analysis showed great variability, with a pattern of poor accuracy for short, irregular bones. KEY POINTS: Question Can a commercially available AI tool reliably detect fractures across anatomical regions, confounding factors, and individual bones -and are there patterns in diagnostic limitations? Findings The AI tool achieved 89% sensitivity and 88% specificity with consistent accuracy across subgroups. However, accuracy dropped for old fractures and irregular short bones. Clinical relevance Despite broad regulatory approval, AI fracture tools may overlook clinically relevant weaknesses. Our in-depth evaluation highlights limitations, guiding responsible clinical use and future research to support safe AI implementation in radiology and informed medicolegal regulation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.578 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.470 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.984 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.814 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.