Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

When external validation isn’t enough: Simpson’s paradox, direction asymmetry, and calibration collapse in cross-continental perioperative mortality prediction

2025·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Objective To test whether stratified within-cohort analysis, bidirectional external validation, and case-level paired bootstrap inference jointly surface failure-mode magnitudes in cross-continental clinical prediction that aggregate metrics conceal. Materials and Methods Eight machine learning models (XGBoost and logistic regression, on preoperative and preoperative+intraoperative feature sets) were trained on each of INSPIRE (Korea; n = 127,413) and MOVER (USA; n = 57,545), then evaluated bidirectionally between cohorts. Case-level paired bootstrap (2,000 iterations) was the primary inferential framework. Direction asymmetry was stress-tested via matched-subsampling across four case-mix dimensions (ASA, Elixhauser comorbidity, emergency proportion, temporal period). Feature-importance transferability used SHAP rank correlation; calibration used slope, intercept, O:E, and Brier score before and after Platt scaling. Results Across the eight cross-population runs, aggregate AUCs concealed substantially lower within-stratum AUCs (Simpson’s paradox gap range 5.0–16.5 pp; worst case: aggregate AUC 0.756 vs within-stratum AUCs 0.58–0.60). Cross-continental transferability was markedly direction-asymmetric (+8.53 pp, 95% CI 6.91–10.24, bootstrap p = 0.001); the asymmetry survived matching on ASA and comorbidity, attenuated to 70–86% of baseline after matching emergency proportion, and was untestable for temporal period. Pre-Platt calibration slopes ranged 0.41–1.29 across all eight cross-population runs; 5-fold CV Platt scaling restored slopes to 0.95–1.02. Intraoperative features conferred a mean external AUC advantage of +3.60 pp (95% CI: +2.75 to +4.39 pp; bootstrap p=0.001). Discussion These magnitudes are clinically material and not visible to conventional aggregate reporting. The methodological commitments that surface them are well-established individually; their joint application here characterizes failure-mode magnitudes that single commitments would underestimate. Conclusion We present this case study as a cautionary reference for cross-population deployment of clinical prediction models, with reproducibility infrastructure released for verification and extension.

Autoren

D Purkayastha

Institutionen

Silchar Medical College and Hospital(IN)

Themen

Cardiac, Anesthesia and Surgical OutcomesArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare

Volltext beim Verlag öffnen

When external validation isn’t enough: Simpson’s paradox, direction asymmetry, and calibration collapse in cross-continental perioperative mortality prediction

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen