Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Leveraging large language models for the deidentification and temporal normalization of sensitive health information in electronic health records

2025·1 Zitationen·npj Digital MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Secondary use of electronic health record notes enhances clinical outcomes and personalized medicine, but risks sensitive health information (SHI) exposure. Inconsistent time formats hinder interpretation, necessitating deidentification and temporal normalization. The SREDH/AI CUP 2023 competition explored large language models (LLMs) for these tasks using 3,244 pathology reports with surrogated SHIs and normalized dates. The competition drew 291 teams; the top teams achieved macro-F1 scores >0.8. Results were presented at the IW-DMRN workshop in 2024. Notably, 77.2% used LLMs, highlighting their growing role in healthcare. This study compares competition results with in-context learning and fine-tuned LLMs. Findings show that fine-tuning, especially with lower-rank adaptation, boosts performance but plateaus or degrades in models over 6 B parameters due to overfitting. Our findings highlight the value of data augmentation, training strategies, and hybrid approaches. Effective LLM-based deidentification requires balancing performance with legal and ethical demands, ensuring privacy and interpretability in regulated healthcare settings.

Autoren

Institutionen

Themen

Machine Learning in HealthcareElectronic Health Records SystemsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Leveraging large language models for the deidentification and temporal normalization of sensitive health information in electronic health records

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen