Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI-Driven Multi-Disease Risk Prediction Using Routine Laboratory Data in Indian Healthcare Settings
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Early Identification of non-communicable disease (NCD) risk inferred from data which is acquired from routine health-checkups can enable timely intervention, especially in resource-limited health care systems. [1] Using the NidaanKosha dataset which consists of 100,000+ anonymized and de-identified Indian Lab reports, we developed a multi-disease risk stratification pipeline that predicts four prevalent chronic disease conditions: diabetes, liver dysfunction, renal impairment and dyslipidemia. Our pipeline handles heterogeneous lab reports via LOINC coding. [5] It derives the strict disease labels from clinical biomarker along with the thresholds of those biomarkers. Our pipeline avoids label leakage by excluding features that are used to predict the chronic conditions more explicitly. Thus the models learn to predict the diseases by excluding features used in outcome definitions. We train and compare the models that use Logistic Regression, XGBoost and Tabnet classifiers under cross-validation. Models’ interpretability is evaluated using SHAP values and TabNet’s feature masks. [6] In our work, XGBoost achieves the highest accuracu score (AUC upto 0.97), with TabNet and LR also performing well. From these results, we show that routinely collected lab data can power scalable AI Screening tools. We also discuss the implications for clinical deployment, including issues of data bias, model fairness and challenges in integrating AI into India’s healthcare systems. [9] We would like to mention our future enhancements including developing a model based on Longitudinal healthcare data available from MIMIC-IV dataset. [10]
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.879 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.574 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 9.011 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.666 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.220 Zit.