Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset
85
Zitationen
4
Autoren
2020
Jahr
Abstract
OBJECTIVE: The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. MATERIALS AND METHODS: We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. RESULTS: Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. CONCLUSIONS: Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.
Ähnliche Arbeiten
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.307 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.300 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.790 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.402 Zit.
scikit-image: image processing in Python
2014 · 6.836 Zit.