OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.03.2026, 12:31

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Apples-to-Apples: Age-Sex Standardisation of Public Chest X-ray Datasets

2025·0 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

Background Public chest radiograph datasets are widely used for model development and benchmarking, but differences in patient demographics can inflate apparent between-dataset differences in disease label prevalence. Objective To quantify the proportion of NIH ChestX-ray14 versus CheXpert prevalence differences that is explained by age and sex alone. Methods A cross-sectional analysis of NIH ChestX-ray14 (n=112,120 studies) and CheXpert (n=223,413) databases was performed. Sex was harmonised to Male/Female and age was categorised as 0-17, 18-39, 40-59, 60-79, and ≥80 years. Five shared labels were assessed: consolidation, atelectasis, pleural effusion, edema, and cardiomegaly. For CheXpert, label uncertainty (-1) was treated as negative in the primary analysis. For each label, we calculated crude prevalence with Wilson 95% confidence intervals and compared datasets using a two-proportion z-test. We then performed direct standardisation by reweighting CheXpert age-sex strata to the NIH age-sex distribution and reported the reduction in the crude prevalence gap attributable to age-sex adjustment. Results Crude prevalence was higher in CheXpert than NIH for all labels (all p<0.001). After age-sex standardisation, CheXpert prevalence decreased for every label, indicating that demographics account for a substantial share of between-dataset differences. For consolidation, the crude gap of 1.96 percentage points (6.12% vs 4.16%) decreased to a standardised gap of 1.47 percentage points (CheXpert standardised 5.63% vs NIH 4.16%), representing approximately a 25% reduction. For atelectasis, the gap declined from 4.85 to 2.84 percentage points (41% reduction approx.). For pleural effusion, the gap declined from 28.10 to 19.03 percentage points (32% reduction approx.). For edema, the gap declined from 21.70 to 14.78 percentage points (32% reduction approx.). For cardiomegaly, the gap declined from 9.45 to 6.55 percentage points (31% reduction approx.). Across labels, age-sex standardisation explained approximately 25% to 40% of the crude prevalence differences. Conclusion A simple age-sex standardisation step explains a large proportion of apparent label prevalence differences between NIH ChestX-ray14 and CheXpert. Routine reporting of standardised prevalence alongside crude estimates and demographic composition can improve fairness and interpretability in cross-dataset benchmarking and reduce the risk of attributing demographic composition effects to labelling or model performance.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen