OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 01.04.2026, 08:09

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Evaluation of Parallel and Sequential Hybrid CNN–ViT Models for Wrist X-Ray Anomaly Detection

2025·1 Zitationen·Applied SciencesOpen Access
Volltext beim Verlag öffnen

1

Zitationen

2

Autoren

2025

Jahr

Abstract

Medical anomaly detection is challenged by limited labeled data and domain shifts, which reduce the performance and generalization of deep learning (DL) models. Hybrid convolutional neural network–Vision Transformer (CNN–ViT) architectures have shown promise, but they often rely on large datasets. Multistage transfer learning (MTL) provides a practical strategy to address this limitation. In this study, we evaluated parallel hybrids, where convolutional neural network (CNN) and Vision Transformer (ViT) features are fused after independent extraction, and sequential hybrids, where CNN features are passed through the ViT for integrated processing. Models were pretrained on non-wrist musculoskeletal radiographs (MURA), fine-tuned on the MURA wrist subset, and evaluated for cross-domain generalization on an external wrist X-ray dataset from the Al-Huda Digital X-ray Laboratory. Parallel hybrids (Xception–DeiT, a data-efficient image transformer) achieved the strongest internal performance (accuracy 88%), while sequential DenseNet–ViT generalized best in zero-shot transfer. After light fine-tuning, parallel hybrids achieved near-perfect accuracy (98%) and recall (1.00). Statistical analyses showed no significant difference between the parallel and sequential models (McNemar’s test), while backbone selection played a key role in performance. The Wilcoxon test found no significant difference in recall and F1-score between image and patient-level evaluations, suggesting balanced performance across both levels. Sequential hybrids achieved up to 7× faster inference than parallel models on the MURA test set while maintaining similar GPU memory usage (3.7 GB). Both fusion strategies produced clinically meaningful saliency maps that highlighted relevant wrist regions. These findings present the first systematic comparison of CNN–ViT fusion strategies for wrist anomaly detection, clarifying trade-offs between accuracy, generalization, interpretability, and efficiency in clinical AI.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIArtificial Intelligence in Healthcare and EducationAnomaly Detection Techniques and Applications
Volltext beim Verlag öffnen