Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ROI-guided relational YOLO–SegNet transformer for lightweight bone tumor segmentation and classification from X-ray images
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Bone tumor detection from X-ray images is challenging due to noise, low contrast, and irregular tumor boundaries that complicate precise segmentation. This study proposes a lightweight Relational YOLO–SegNet framework integrating an Optimized Savitzky–Golay Digital Filter (OSGDF), an ROI-restricted Relational Transformer Block (RTrB), and Fire Hawk Election Optimizer (FHEO)-based hyperparameter tuning. The proposed framework operates as an ROI-guided detection–segmentation pipeline, where YOLOv8 first localizes tumor regions, after which a Relational YOLO–SegNet model performs precise pixel-level segmentation. Image-level classification of normal versus tumor cases is subsequently derived from the segmented regions, making segmentation the primary objective of the framework. The unique contribution of OSGDF lies in its adaptive parameter selection using Tunicate Swarm Optimization, which improves noise suppression while preserving edge sharpness; this resulted in an SNR improvement from 21.4 dB to 29.6 dB, enhancing boundary delineation prior to segmentation. The proposed model applies relational attention only to YOLO-detected tumor regions rather than the entire image token space, reducing computational complexity while maintaining long-range contextual modeling. The framework contains 12.3 million trainable parameters, fewer than conventional encoder–decoder architectures such as UNet and Mask R-CNN. Experiments conducted on a publicly available dataset of 809 X-ray images (421 normal, 388 tumor) with expert-provided pixel-level annotations achieved 98.5% accuracy, 98.32% precision, 98.83% sensitivity (recall), 98.21% specificity, 98.57% F1-score, 97% Dice score, 97.1% Jaccard index, and an AUC of 0.981 under five-fold cross-validation (98.5 ± 0.3%). Statistical analysis confirmed that improvements over baseline models were significant (p < 0.05). The model achieved an inference time of 48 ms per image on an NVIDIA RTX 3090 GPU (24 GB VRAM), demonstrating computational efficiency suitable for resource-constrained deployment scenarios. While the results indicate strong dataset-specific performance, external multi-institutional validation is required before clinical translation. The proposed framework may serve as a potential research-support tool for automated bone tumor analysis using X-ray imaging.
Ähnliche Arbeiten
Deep Residual Learning for Image Recognition
2016 · 216.663 Zit.
U-Net: Convolutional Networks for Biomedical Image Segmentation
2015 · 86.244 Zit.
ImageNet classification with deep convolutional neural networks
2017 · 75.550 Zit.
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014 · 75.406 Zit.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2016 · 52.833 Zit.