Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach

2025·2 Zitationen·Journal of the American Medical Informatics AssociationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVE: Stigmatizing language (SL) in Electronic Health Records (EHRs) can perpetuate biases and negatively impact patient care. This study introduces a novel method for automatically detecting such language to improve healthcare documentation practices. MATERIALS AND METHODS: We developed a multi-stage transfer learning framework integrating semantic, syntactic, and task adaptation using three datasets: hate speech, clinical phenotypes, and stigmatizing language. Experiments were conducted on stigmatizing language dataset which consists of 4,129 de-identified EHR notes (72.7% stigmatizing, 27.3% non-stigmatizing), split 80/20 for training and testing. Longformer, BERT, and ClinicalBERT models were evaluated, and model performance was assessed on 35 randomized subsets of the test set (each comprising 70% of test data). The Wilcoxon-Mann-Whitney test was used to evaluate statistical significance, with Bonferroni correction applied to control for multiple hypothesis testing. Baseline models included zero-shot and few-shot GPT-4o, Support Vector Machine, Random Forest, Logistic Regression, and Multinomial Naive Bayes. RESULTS: The proposed framework achieved the highest accuracy, with fully adapted Longformer reaching 89.83%. Performance improvements remained statistically significant after Bonferroni correction compared to all baselines (p < .05). The framework demonstrated robust gains across different stigmatizing language types. DISCUSSION: This study underscores the value of domain-adaptive NLP for detecting stigmatizing language in EHRs. The multi-stage transfer learning framework effectively captures subtle biases often missed by conventional models, enabling more objective and respectful clinical documentation. CONCLUSION: This framework offers a statistically validated, high-performing framework for detecting stigmatizing language in EHRs, supporting responsible AI and promoting equity in clinical care.

Autoren

Institutionen

Rutgers, The State University of New Jersey(US)

Themen

Machine Learning in HealthcareTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen