OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.05.2026, 00:54

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Challenge of Data Scarcity and Imbalanced Classes in Radiomics Performance

2025·0 Zitationen·Computer Methods and Programs in BiomedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

BackgroundRadiomics holds great promise for non-invasive clinical prediction, offering insights into disease characteristics that traditional methods might miss. However, its application is often constrained by challenges like small sample sizes and class imbalance, which are common in real-world datasets. These limitations can lead to model overfitting and poor generalization. This study systematically investigates the impact of these two factors both in isolation and in combination, evaluating how they affect model performance and exploring strategies to mitigate their effects, with the goal of enhancing model robustness and clinical applicability.MethodsThree radiomics datasets—PI-CAI (prostate cancer), BraTS2021 (glioblastoma), and Hunter2023 (lung cancer)—were analyzed under four experimental conditions: a baseline (balanced, fixed-size dataset), progressive class imbalance, progressive sample size reduction, and a combined scenario. Five machine learning models were evaluated, with Random Forest ultimately selected as the reference model. Class imbalance was addressed using state-of-the-art sampling techniques, and data scarcity was mitigated using Tabular Variational Autoencoders (TVAE). Performance was assessed across five metrics (sensitivity, specificity, accuracy ROC-AUC, and balanced accuracy), with statistical significance evaluated via t-tests.ResultsFeature selection played a key role in both model performance and interpretability. The most predictive selected features were biologically plausible and dataset-specific, such as perinodular texture heterogeneity in lung cancer or gray-level non-uniformity in glioblastoma. Class imbalance significantly degraded performance, especially under unsampled conditions. Applying the best-performing sampling method—typically an undersampling strategy—consistently improved Balanced Accuracy and Specificity. TVAE provided modest improvements under sample size reduction, but these were not statistically significant. In combined scenarios, the use of TVAE together with the best sampler yielded the highest gains, particularly under moderate data constraints.ConclusionClass imbalance and small sample size each impair radiomics model performance, and their effects compound under combined conditions. Although targeted sampling and augmentation strategies provide partial mitigation, model generalizability remains constrained under extreme conditions, highlighting the ongoing need for methodological advancements.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Radiomics and Machine Learning in Medical ImagingAI in cancer detectionArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen