Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Ensuring Data Integrity: The Role of Data Engineering and Pipe lines in Labeling AI-Generated Images and Videos
0
Zitationen
1
Autoren
2024
Jahr
Abstract
The proliferation of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has shown impressive success in image synthesis. This capability can enhance content and media but also poses threats to legitimacy, authenticity, and security. As AI transitions from research to deployment, creating appropriate datasets and data pipelines to develop and evaluate AI models is increasingly the biggest challenge. Automated AI model builders that are publicly available can now achieve top performance in many applications. This paper discusses the importance of data engineering and pipelines in creating curated and clean data services, emphasizing the role of labeling AI-generated content to mitigate misinformation. It summarizes referenced findings from large-scale experiments on labeling effectiveness and highlights challenges in designing, evaluating, and implementing labeling policies. Key considerations for each stage of the data-for-AI pipeline-starting from data design to data sculpting (for example, cleaning, valuation, and annotation) and data evaluation—are discussed to make AI more reliable.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.549 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.941 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.