OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 06:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning\n Research

2021·57 Zitationen·arXiv (Cornell University)Open Access
Volltext beim Verlag öffnen

57

Zitationen

4

Autoren

2021

Jahr

Abstract

Benchmark datasets play a central role in the organization of machine\nlearning research. They coordinate researchers around shared research problems\nand serve as a measure of progress towards shared goals. Despite the\nfoundational role of benchmarking practices in this field, relatively little\nattention has been paid to the dynamics of benchmark dataset use and reuse,\nwithin or across machine learning subcommunities. In this paper, we dig into\nthese dynamics. We study how dataset usage patterns differ across machine\nlearning subcommunities and across time from 2015-2020. We find increasing\nconcentration on fewer and fewer datasets within task communities, significant\nadoption of datasets from other tasks, and concentration across the field on\ndatasets that have been introduced by researchers situated within a small\nnumber of elite institutions. Our results have implications for scientific\nevaluation, AI ethics, and equity/access within the field.\n

Ähnliche Arbeiten

Autoren

Themen

Ethics and Social Impacts of AIExplainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen