Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach
2
Zitationen
4
Autoren
2025
Jahr
Abstract
OBJECTIVE: Stigmatizing language (SL) in Electronic Health Records (EHRs) can perpetuate biases and negatively impact patient care. This study introduces a novel method for automatically detecting such language to improve healthcare documentation practices. MATERIALS AND METHODS: We developed a multi-stage transfer learning framework integrating semantic, syntactic, and task adaptation using three datasets: hate speech, clinical phenotypes, and stigmatizing language. Experiments were conducted on stigmatizing language dataset which consists of 4,129 de-identified EHR notes (72.7% stigmatizing, 27.3% non-stigmatizing), split 80/20 for training and testing. Longformer, BERT, and ClinicalBERT models were evaluated, and model performance was assessed on 35 randomized subsets of the test set (each comprising 70% of test data). The Wilcoxon-Mann-Whitney test was used to evaluate statistical significance, with Bonferroni correction applied to control for multiple hypothesis testing. Baseline models included zero-shot and few-shot GPT-4o, Support Vector Machine, Random Forest, Logistic Regression, and Multinomial Naive Bayes. RESULTS: The proposed framework achieved the highest accuracy, with fully adapted Longformer reaching 89.83%. Performance improvements remained statistically significant after Bonferroni correction compared to all baselines (p < .05). The framework demonstrated robust gains across different stigmatizing language types. DISCUSSION: This study underscores the value of domain-adaptive NLP for detecting stigmatizing language in EHRs. The multi-stage transfer learning framework effectively captures subtle biases often missed by conventional models, enabling more objective and respectful clinical documentation. CONCLUSION: This framework offers a statistically validated, high-performing framework for detecting stigmatizing language in EHRs, supporting responsible AI and promoting equity in clinical care.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.704 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.545 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.931 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.532 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.046 Zit.