Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Human level information extraction from clinical reports with finetuned language models
0
Zitationen
15
Autoren
2025
Jahr
Abstract
Extracting structured data from clinical notes remains a key bottleneck in clinical research. We hypothesized that with minimal computational and annotation resources, open-source large language models (LLMs) could create high-quality research databases. We developed Strata, a low-code library for leveraging LLMs for data extraction from clinical reports. Trained researchers labeled four datasets from prostate MRI, breast pathology, kidney pathology, and bone marrow (MDS) pathology reports. Using Strata, we evaluated open-source LLMs, including instruction-tuned, medicine-specific, reasoning-based, and LoRA-finetuned LLMs. We compared these models to zero-shot GPT-4 and a second human annotator. Our primary evaluation metric was exact match accuracy, which assesses if all variables for a report were extracted correctly. LoRa-finetuned Llama-3.1 8B achieved non-inferior performance to the second human annotator across all four datasets, with an average exact match accuracy of 90.0 ± 1.7. Fine-tuned Llama-3.1 outperformed all other open-source models, including DeepSeekR1-Distill-Llama and Llama-3-8B-UltraMedical, which obtained average exact match accuracies of 56.8 ± 29.0 and 39.1 ± 24.4 respectively. GPT-4 was non-inferior to the second human annotator in all datasets except kidney pathology. Small, open-source LLMs offer an accessible solution for the curation of local research databases; they obtain human-level accuracy while only leveraging desktop-grade hardware and ≤ 100 training reports. Unlike commercial LLMs, these tools can be locally hosted and version-controlled. Strata enables automated human-level performance in extracting structured data from clinical notes using ≤ 100 training reports and a single desktop-grade GPU.