Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Oncology data extraction with large language models from real-world breast cancer electronic health records in Spanish

2026·0 Zitationen·Machine Learning with ApplicationsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

The integration of Artificial Intelligence (AI) in healthcare systems has the potential to significantly enhance patient care and streamline clinical processes. This research investigates the utilisation of generative AI and large language models (LLMs) for oncological information extraction (IE) from Spanish real electronic health records (EHRs) to enhance clinical decision-making and research. We conducted a comparative analysis of GPT-4.5 and 11 state-of-the-art, locally executable LLM-based chatbots, including Llama 3.2, Mistral-Small 3.2, and Phi-4, to extract specific clinical entities from real EHR narratives. Our evaluation workflow aimed to assess the performance of these models in contexts with computational constraints, specifically targeting the extraction of breast cancer prognostic factors. Initial findings indicate that while open-source LLM models are improving, they are not yet equivalent to human specialists in terms of Named Entity Recognition (NER) accuracy. The language of the clinical records notably influences performance, revealing that smaller models particularly struggle with Spanish text. However, with careful model selection and output post-processing, Mistral-Small 3.2 achieved a detection F1 score of over 74.7% for critical TNM information. This study highlights significant potential for generative AI in clinical IE but underscores the need for ongoing improvements, particularly in handling linguistic diversity. Locally managed open source models are still far from performing like a human specialist, but addressing common model shortcomings can facilitate the integration of AI-driven solutions into public healthcare systems, thereby improving patient outcomes and fostering efficient data utilisation.

Autoren

Institutionen

Themen

Topic ModelingMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Oncology data extraction with large language models from real-world breast cancer electronic health records in Spanish

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen