Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Data-Centric Explainable Debiasing for Improving Fairness in Pre-trained Language Models
0
Zitationen
5
Autoren
2024
Jahr
Abstract
Human-like social bias of pre-trained language models (PLMs) on downstream tasks have attracted increasing attention.The potential flaws in the training data are the main factor that causes unfairness in PLMs.Existing datacentric debiasing strategies mainly leverage explicit bias words (defined as sensitive attribute words specific to demographic groups) for counterfactual data augmentation to balance the training data.However, they lack consideration of implicit bias words potentially associated with explicit bias words in complex distribution data, which indirectly harms the fairness of PLMs.To this end, we propose a Data-Centric Debiasing method (named Data-Debias), which uses an explainability method to search for implicit bias words to assist in debiasing PLMs.Specifically, we compute the feature attributions of all tokens using the Integrated Gradients method, and then treat the tokens that have a large impact on the model's decision as implicit bias words.To make the search results more precise, we iteratively train a biased model to amplify the bias with each iteration.Finally, we use the implicit bias words searched in the last iteration to assist in debiasing PLMs.Extensive experimental results on multiple PLMs debiasing on three different classification tasks demonstrate that Data-Debias achieves state-of-the-art debiasing performance and strong generalization while maintaining predictive abilities.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.380 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.243 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.671 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.496 Zit.