Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Augmenting Oncology Guideline Maintenance with Large Language Models: A Prospective Evaluation (Preprint)

2026·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

<sec> <title>BACKGROUND</title> Maintenance of oncology clinical practice guidelines (CPGs) is increasingly challenged by the rapid growth of trial data and therapeutic complexity. While large language models (LLMs) have shown promise in information retrieval, their utility in the rigorous, end-to-end workflow of guideline maintenance remains under-explored. </sec> <sec> <title>OBJECTIVE</title> This study aimed to systematically evaluate the performance of frontier LLMs in supporting oncology guideline maintenance. We sought to determine their reliability in predicting necessary guideline updates based on new evidence, their accuracy in extracting data from clinical trials, and their effectiveness as automated auditors for detecting errors in established guidelines. </sec> <sec> <title>METHODS</title> Using the Onkopedia peripheral T-cell lymphoma (PTCL) guideline as a natural experiment, we tasked frontier models with deep-research modes (Gemini 2.5 Pro, GPT o4-mini-high) to predict a guideline update in August 2025 based on the 2021 version. Predictions were validated against the official 2025 revision published in October 2025. Next, we benchmarked evidence extraction accuracy across 80 pivotal trials using models of varying scale (27B–671B parameters vs. frontier). Finally, we deployed a stacked LLM workflow to audit 28 recently updated Onkopedia guidelines for linguistic and content-related errors. </sec> <sec> <title>RESULTS</title> In the predictive task, models captured 36.7–40% of substantive updates, often identifying landmark approvals but frequently overstating evidence. While frontier models demonstrated high accuracy (up to 99.2%) in extracting data from individual studies, substantially outperforming smaller open-source models, this precision declined during multi-source synthesis. As automated auditors of existing CPGs, the models successfully identified a median of 16.5 formal errors per document and detected several clinically relevant inconsistencies (e.g., invalid scoring formulas, incorrect staging definitions). </sec> <sec> <title>CONCLUSIONS</title> LLMs currently lack the reasoning stability for autonomous guideline authoring due to deficits in complex synthesis. However, they are effective tools for high-fidelity evidence extraction and automated quality assurance, supporting a human-led, AI-augmented workflow for efficient guideline maintenance. </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationClinical practice guidelines implementationCancer Genomics and Diagnostics

Volltext beim Verlag öffnen

Augmenting Oncology Guideline Maintenance with Large Language Models: A Prospective Evaluation (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen