Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
In-Context Bias Propagation in LLM-Based Tabular Data Generation
0
Zitationen
7
Autoren
2025
Jahr
Abstract
Large Language Models (LLMs) are increasingly used for synthetic tabular data generation through in-context learning (ICL), offering a practical solution for data augmentation in data scarce scenarios. While prior work has shown the potential of LLMs to improve downstream task performance through augmenting underrepresented groups, these benefits often assume access to a subset of unbiased in-context examples, representative of the real dataset. In real-world settings, however, data is frequently noisy and demographically skewed. In this paper, we systematically study how statistical biases within in-context examples propagate to the distribution of synthetic tabular data, showing that even mild in-context biases lead to global statistical distortions. We further introduce an adversarial scenario where a malicious contributor can inject bias into the synthetic dataset via a subset of in-context examples, ultimately compromising the fairness of downstream classifiers for a targeted and protected subgroup. Finally, we evaluate mitigation strategies based on preprocessing in-context examples, demonstrating that while such interventions can attenuate disparity, the inherent sensitivity of LLMs to adversarial prompts remains a persistent challenge. Our findings highlight a critical new vulnerability in LLM-based data generation pipelines within sensitive domains.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.723 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 25.037 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.868 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.512 Zit.
Xception: Deep Learning with Depthwise Separable Convolutions
2017 · 18.736 Zit.