Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Feature Selection for Identification of Risk Factors Associated with Infant Mortality
0
Zitationen
5
Autoren
2022
Jahr
Abstract
In the context of infant mortality risk analyses, the application of Machine Learning techniques, like Feature Selection, can be an efficient way to increase the interpretability of data and explanation of the studied phenomenon. In this paper, we developed a Machine Learning approach to identify the main risk factors that impact the local population studied with regard to infant mortality, aiming to help professionals who deal directly with the event or with the epidemiological guidelines that may be made available from data analysis. First, we integrated the databases of the Live Birth Information System (SINASC) and the Infant Mortality Information System (SIM), between 2006 and 2019, in the city of Vitória, ES, Brazil. Then, we used feature selection methods, such as SHAP, Feature_Importance and SelectKBest, to identify the main risk factors associated with infant mortality and we compared the results obtained from applying these algorithms with the most recent results of a 2018 meta-analysis. We observed that the results achieved by the methods, especially by the SHAP method, match the results of a literature meta-analysis, in which the factors that most influenced the final outcome of mortality were Weight, APGAR, Gestational Age and Presence of Anomalies. Therefore, the use of interpretability techniques, such as SHAP, are very promising for the selection and the identification of population risk factors related to infant mortality, by using existing databases without the need for new population studies and, in addition, this knowledge can be used to help in decision making for public health.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.446 Zit.
UCI Machine Learning Repository
2007 · 24.290 Zit.
An introduction to ROC analysis
2005 · 20.683 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.120 Zit.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
1983 · 7.064 Zit.