Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Probabilistic Versus Deep Generative Models: A Fairness Centred Evaluation of Synthetic Healthcare Tabular Data

2026·0 Zitationen·International Journal of Computational Intelligence SystemsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Synthetic data offers a promising avenue for addressing privacy, scarcity, and fairness challenges in healthcare datasets. However, there is limited evaluation of how different generation methods balance fidelity, utility, and fairness, particularly for underrepresented subgroups. This study addresses this gap by comparing representative generative modelling techniques, both probabilistic and deep approaches, that are popular in the research literature. We empirically evaluate BayesBoost, CTGAN, TVAE, CopulaGAN, and DECAF on two healthcare datasets containing numerical, binary, and categorical features. Each model’s performance is assessed along three axes: data fidelity, machine learning utility, and fairness, using Accuracy Parity, Equalised Odds, and Predictive Rate Parity. Results show that BayesBoost consistently achieved superior fidelity, utility, and fairness preservation, particularly when paired with Random Forest classifiers, achieving around 60 – 63 % higher downstream utility than GAN-based deep generative baselines (e.g., Random Forest accuracy up to 0.88 with BayesBoost versus 0.54 to − 0.55 for GAN-based methods). Deep generative models, while effective in capturing complex structures, often degraded fairness, especially for underrepresented groups, with equalised odds deviating by over 100 % from the ideal parity value of 1.0 in some settings. The Variational Autoencoder outperformed other deep generative models in fairness preservation, especially for equalised odds, although with some reduction in fidelity and utility. Overall, these findings suggest that synthetic data generation for healthcare must move beyond fidelity evaluations to explicitly assess fairness and subgroup impacts, with probabilistic models such as BayesBoost showing strong potential for ethical deployment, while deep generative models require further adaptation for fairness-sensitive applications.

Autoren

Institutionen

Brunel University of London(GB)

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationPrivacy-Preserving Technologies in Data

Volltext beim Verlag öffnen

Probabilistic Versus Deep Generative Models: A Fairness Centred Evaluation of Synthetic Healthcare Tabular Data

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen