Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Converting unstructured cardiac catheterization and echocardiography reports into structured data using transformer-based language models

2026·0 Zitationen·JAMIA OpenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Objectives: Echocardiography and cardiac catheterization reports capture important clinical assessment information of cardiac function and disease severity. This study explores using open-source transformer-based language models (LMs) that are run locally within an institutional environment as a privacy-preserving alternative to external API-based large LM to systematically extract clinical data from unstructured echocardiography and cardiac catheterization reports, aiming to improve data accessibility for research and patient care. Materials and Methods: Two transformer-based LMs, BioclinicalBERT and BART-Large-CNN, were fine-tuned in a secure local environment using a question-answering approach. The dataset included 3286 echocardiography and 1884 cardiac catheterization reports from Kaiser Permanente Southern California's electronic health records, annotated for 25 and 47 predefined categories, respectively. Three hundred reports from each type were randomly selected and used for validation, with the remainder for training. Model performance was assessed using accuracy, precision, recall, and F1-score at 2 probability thresholds. The effect of training set size on model performance was also evaluated. Results: Both models achieved consistent and high accuracy, precision, and recall (all >90%) across the 5 seed runs for both report types. For echocardiography, BioclinicalBERT reached mean accuracy of 95.7%, precision of 97.6%, recall of 97.4%, and F1-score of 0.98 at the probability threshold of 0.1; BART-Large-CNN had similar results. For cardiac catheterization, BART-Large-CNN slightly outperformed BioclinicalBERT with mean accuracy 94.9% vs 94.3%; precision 96.7% vs 96.3%; recall 96.1% vs 95.7%, and F1-score 0.96 vs 0.96 at the probability threshold of 0.1. Most individual categories showed strong performance, though a few (eg, prosthetic mitral valve, right atrial pressure) had lower scores. Performance improved with more training data, but plateauing around 1000 reports. Discussion and conclusion: Fine-tuned transformer-based LMs can effectively extract structured data from unstructured cardiac reports, supporting automated information extraction to enhance research and clinical applications.

Autoren

Institutionen

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationECG Monitoring and Analysis

Volltext beim Verlag öffnen

Converting unstructured cardiac catheterization and echocardiography reports into structured data using transformer-based language models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen