Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Understanding Machine Learning testing in practice
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Machine Learning is increasingly embedded in critical software systems, making their quality assurance a matter of growing concern. While the research community has proposed several techniques for testing ML-enabled systems, there is limited empirical evidence on whether these techniques are adopted in practice or align with developers’ testing workflows. This paper presents a two-step empirical investigation aimed at characterizing the current landscape of ML testing in real-world development. Our goal is to understand how developers approach testing, whether proposed techniques are adopted, and what barriers hinder their implementation. We designed a mixed-method study that triangulates insights from two complementary sources: (1) a mining study of 398 open-source repositories to analyze implemented testing strategies and tool usage; and (2) a survey of 100 practitioners to capture perceptions, motivations, and practical challenges. Our findings reveal that developers rely heavily on foundational strategies like Smoke Testing and Rule-Based Checking , implemented through custom testing logic built on general-purpose libraries (e.g., PyTest , NumPy ). Conversely, we identified a critical adoption gap in specialized tools and advanced techniques such as Metamorphic Testing , which are rarely implemented despite their academic prominence. Our survey indicates that this gap is driven by practical barriers, including high integration costs and a poor fit with existing developer workflows. These findings suggest that future research and tooling must prioritize usability, integration, and a clearer alignment with the pragmatic needs of developers. • Large-scale mixed-method investigation of ML testing practices in real-world development. • Triangulated insights from 398 open-source repositories (2, 018 test files) and 100 practitioners. • Practitioners rely on foundational strategies like Smoke Testing, implemented via custom solutions. • Critical adoption gap for specialized tools and advanced techniques due to workflow integration barriers. • Released datasets, analysis scripts, and a technical report to enable replication.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 21.167 Zit.
Generative Adversarial Nets
2023 · 19.896 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.397 Zit.
"Why Should I Trust You?"
2016 · 14.897 Zit.
Generative adversarial networks
2020 · 13.432 Zit.