Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Multicenter deep learning for multi-abnormality screening on hip radiographs: development, external validation, and assisted reader study
0
Zitationen
10
Autoren
2026
Jahr
Abstract
• A deep learning model was developed and validated for the simultaneous screening of eight common hip abnormalities using a large, multicenter dataset of 30,508 hip radiographs. • On an external test set, the model achieved an overall accuracy of 90.11% and a macro-averaged area under the curve of 0.99. • The model demonstrated high sensitivity for acute injuries, including femoral neck fracture (95.61%) and intertrochanteric fracture (95.31%). • The model’s accuracy was significantly higher than that of resident and attending surgeons and was comparable to that of deputy chief surgeons. • With model assistance, the diagnostic accuracy of all participating surgeons improved significantly. Hip abnormalities are a major cause of pain and functional impairment, yet missed diagnoses remain common due to high clinical workloads and interobserver variability. Existing deep learning (DL) models mostly target single pathologies, limiting their utility for comprehensive screening. To develop and validate a DL model for simultaneous screening of multiple hip conditions using a large, multicenter dataset of pelvic radiographs. Data from two hospitals (25,908 hips) trained and internally validated a ResNet-50-based model with enhancements (convolutional block attention module, generalized mean pooling, and class-balanced focal loss) for classifying eight categories: normal, hip osteoarthritis, osteonecrosis of the femoral head, femoral neck fracture (FNF), intertrochanteric fracture (ITF), developmental dysplasia of the hip, total hip arthroplasty, and metallic internal fixation. External testing used 4,600 hips from a third hospital. Model performance was compared with six orthopedic surgeons of varying seniority, and a two-phase reader study assessed diagnostic performance with and without model assistance; non-inferiority was prespecified at a 5% margin for accuracy and macro-F1. On the internal test set, the model achieved 93.93% accuracy (95% confidence interval [CI]: 93.26–94.56), macro-AUC 0.99, and macro-F1 90.66% (95% CI: 88.83–92.14). On the external test set, accuracy was 90.11% (95% CI: 89.28–90.98), with macro-AUC 0.99 and macro-F1 87.29% (95% CI: 86.02–88.56). Sensitivity was high for acute injuries (FNF: 95.61% [95% CI: 92.82–98.17]; ITF: 95.31% [95% CI: 90.36–98.94]). The model outperformed residents/attendings ( P < 0.001) and was non-inferior to deputy chiefs. Assistance improved surgeon accuracy (gains: 3.93%–11.33%) and inter-rater agreement (κ: 0.52–0.88 to 0.69–0.88). We developed and externally validated a DL model for automated screening of multiple common hip abnormalities. The system is well-suited for triage workflows by flagging high-risk cases for expedited review and may function as a supportive second reader in high-volume or resource-limited settings.
Ähnliche Arbeiten
Radiological Assessment of Osteo-Arthrosis
1957 · 12.364 Zit.
Traumatic Arthritis of the Hip after Dislocation and Acetabular Fractures
1969 · 5.636 Zit.
Traumatic Arthritis of the Hip After Dislocation and Acetabular Fractures: Treatment by Mold Arthroplasty: An End-Result Study Using a New Method of Result Evaluation
2013 · 5.081 Zit.
Osteoarthritis
2019 · 4.530 Zit.
Burden of major musculoskeletal conditions.
2003 · 3.532 Zit.