Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Apples-to-Apples: Age-Sex Standardisation of Public Chest X-ray Datasets

2025·0 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background Public chest radiograph datasets are widely used for model development and benchmarking, but differences in patient demographics can inflate apparent between-dataset differences in disease label prevalence. Objective To quantify the proportion of NIH ChestX-ray14 versus CheXpert prevalence differences that is explained by age and sex alone. Methods A cross-sectional analysis of NIH ChestX-ray14 (n=112,120 studies) and CheXpert (n=223,413) databases was performed. Sex was harmonised to Male/Female and age was categorised as 0-17, 18-39, 40-59, 60-79, and ≥80 years. Five shared labels were assessed: consolidation, atelectasis, pleural effusion, edema, and cardiomegaly. For CheXpert, label uncertainty (-1) was treated as negative in the primary analysis. For each label, we calculated crude prevalence with Wilson 95% confidence intervals and compared datasets using a two-proportion z-test. We then performed direct standardisation by reweighting CheXpert age-sex strata to the NIH age-sex distribution and reported the reduction in the crude prevalence gap attributable to age-sex adjustment. Results Crude prevalence was higher in CheXpert than NIH for all labels (all p<0.001). After age-sex standardisation, CheXpert prevalence decreased for every label, indicating that demographics account for a substantial share of between-dataset differences. For consolidation, the crude gap of 1.96 percentage points (6.12% vs 4.16%) decreased to a standardised gap of 1.47 percentage points (CheXpert standardised 5.63% vs NIH 4.16%), representing approximately a 25% reduction. For atelectasis, the gap declined from 4.85 to 2.84 percentage points (41% reduction approx.). For pleural effusion, the gap declined from 28.10 to 19.03 percentage points (32% reduction approx.). For edema, the gap declined from 21.70 to 14.78 percentage points (32% reduction approx.). For cardiomegaly, the gap declined from 9.45 to 6.55 percentage points (31% reduction approx.). Across labels, age-sex standardisation explained approximately 25% to 40% of the crude prevalence differences. Conclusion A simple age-sex standardisation step explains a large proportion of apparent label prevalence differences between NIH ChestX-ray14 and CheXpert. Routine reporting of standardised prevalence alongside crude estimates and demographic composition can improve fairness and interpretability in cross-dataset benchmarking and reduce the risk of attributing demographic composition effects to labelling or model performance.

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Apples-to-Apples: Age-Sex Standardisation of Public Chest X-ray Datasets

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen