Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Building a Comprehensive Robust Framework for Predictive Machine Learning Models Development Using Real-World Clinical Data
0
Zitationen
5
Autoren
2025
Jahr
Abstract
This study presents a comprehensive and considerably automated framework for development, evaluation, and validation of prediction models using machine learning (ML) algorithms and real-world clinical data. Specifically, the framework was designed to predict preventable hospitalizations in patients with arterial hypertension (AH) and its complications, a critical clinical task given the significant economic and social costs associated with inpatient treatment of these patients. The field of cardiology is currently faced with the challenge of developing widely accepted prognostic scales for patients with arterial hypertension, and ML methods offer promising solutions to this issue. The framework was tested on a large dataset of 1,165,770 depersonalized electronic health records of 151,492 patients with AH, with 43 potential predictors considered. The framework includes essential steps such as preprocessing (including missing value imputation, scaling, and class imbalance correction), optimal model selection and testing, and external validation with a clear and an unified approach to selection of the best model. The XGBoost algorithm with Random Undersampling showed the best results and stability to external data with an area under the receiver operating characteristic curve (AUROC) of 0.815 (95% CI 0.797-0.835), demonstrating its potential for close monitoring of high-risk patients, early preventive interventions, and optimized medical care.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.446 Zit.
UCI Machine Learning Repository
2007 · 24.290 Zit.
An introduction to ROC analysis
2005 · 20.689 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.122 Zit.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
1983 · 7.065 Zit.