OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.05.2026, 01:15

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Robust comparative evaluation of 15 natural language processing algorithms to positively identify patients with inflammatory bowel disease from secondary care records

2025·1 Zitationen·BMJ Open GastroenterologyOpen Access
Volltext beim Verlag öffnen

1

Zitationen

5

Autoren

2025

Jahr

Abstract

OBJECTIVE: Natural language processing (NLP) can identify cohorts of patients with inflammatory bowel disease (IBD) from free text. However, limited sharing of code, models, and data sets continues to hinder progress. The aim of this study was to evaluate multiple open-source NLP models for identifying IBD cohorts, reporting on document-to-patient-level classification, while exploring explainability, generalisability, fairness and cost. METHODS: 15 algorithms were assessed, covering all types of NLP spanning over 50 years of NLP development. Rule-based (regular expressions, spaCy with negation), and vector-based (bag-of-words (BoW), term frequency inverse document frequency (TF IDF), word-2-vector), to transformers: (two sentence-based sBERT models, three bidirectional encoder representations from transformers (BERT) models (distilBERT, BioclinicalBERT, RoBERTa), and five large language models (LLMs): (Mistral-Instruct-v0.3-7B, M42-Health/Llama-v3-8B, Deepseek-R1-Distill-Qwen-v2.5-32B, Qwen-v3-32B, and Deepseek-R1-Distill-Llama-v3-70B). Models were comparatively evaluated based on full confusion matrices, time/environmental costs, fairness, and explainability. RESULTS: A total of 9311 labelled documents were evaluated. The fine-tuned DistilBERT_IBD model achieved the best performance overall (micro F1: 93.54%), followed by sBERT-Base (micro F1: 93.05%); however, specificity was an issue for both: (67.80-64.41%) respectively. LLMs performed well, given that they had never seen the training data (micro F1: 86.47-92.20%), but were comparatively slow (18-300 hours) and expensive. Bias was a significant issue for all effective model types. CONCLUSION: NLP has undergone significant advancements over the last 50 years. LLMs appear likely to solve the problem of re-identifying patients with IBD from clinical free text sources in the future. Once cost, performance and bias issues are addressed, they and their successors are likely to become the primary method of data retrieval for clinical data warehousing.

Ähnliche Arbeiten