OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.05.2026, 02:08

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Temporal Reproducibility of a Genetic Algorithm–Derived Health Risk Score: Standardized Out-of-Fold Validation Framework (2021-2023)

2026·0 Zitationen·JMIR Bioinformatics and BiotechnologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

Background: Genetic algorithm (GA)-based scoring has been proposed as a data-driven approach for health risk stratification . However, performance estimates may be inflated when preprocessing, optimization, and evaluation are not strictly separated within a prespecified validation framework. Demonstrating temporal reproducibility under a standardized out-of-fold (OOF) evaluation framework with transparent uncertainty quantification is therefore essential for ensuring translational reliability in preventive health screening. Objective: This study aimed to evaluate the temporal reproducibility of a GA-derived composite health risk score across three consecutive annual cohorts (2021-2023) under a standardized OOF validation pipeline and to assess robustness to policy-driven structural HbA1c missingness through a prespecified ON/OFF sensitivity analysis. Methods: Annual health examination datasets from 2021 (n=3744), 2022 (n=5153), and 2023 (n=5352) were analyzed using an identical preprocessing and modeling pipeline. Thirteen clinical indicators and eight lifestyle questionnaire variables were included as predictors. The outcome was based on an A-D grading framework and binarized using an OR rule across domains (grade ≥B in any domain). Continuous variables were median-imputed and standardized within each training fold to prevent information leakage. GA optimization was performed using fixed random seeds, and fitness estimation employed stratified K-fold cross-validation. Predicted probabilities were obtained by fitting logistic regression models to GA-derived composite scores within the OOF framework. Discrimination and overall predictive performance were quantified using the area under the receiver operating characteristic curve (AUC) and the Brier score calculated from OOF predicted probabilities. Uncertainty was estimated using 2,000-replicate percentile bootstrap resampling. A prespecified sensitivity analysis excluded HbA1c while maintaining an identical evaluation framework. Results: OOF AUC values were stable across cohorts (2021: 0.810; 2022: 0.814; 2023: 0.812), with overlapping 95% percentile bootstrap confidence intervals. Brier scores ranged from 0.172 to 0.176. Exclusion of HbA1c resulted in small changes in discrimination (median ΔAUC was ≤0.007), consistent with the prespecified ON/OFF sensitivity analysis. Conclusions: Under a harmonized OOF validation framework, the GA-derived composite risk score showed stable temporal discrimination and consistent overall predictive performance across three consecutive annual cohorts. These findings underscore the methodological importance of prespecified, standardized evaluation procedures and transparent uncertainty quantification when assessing reproducibility of risk stratification models in routine health screening data.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Genetic Associations and EpidemiologyGenomics and Rare DiseasesArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen