Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large language models for risk-of-bias assessment in randomised clinical trials—a comparative validation study
0
Zitationen
8
Autoren
2026
Jahr
Abstract
BACKGROUND: Large language models (LLMs) are emerging tools for evidence synthesis. Risk of bias (RoB) assessment of trials remains an essential but time-consuming step inconsistent even amongst experts. Early LLM studies showed mixed reliability. Advances in reasoning-enabled models warrant evaluation of their accuracy and consistency for RoB screening across randomised trials to reduce reviewer workload. METHODS: -score). FINDINGS: For RoB 1, interobserver agreement ranged from κ 0.0.27 (95% CI 0.07-0.46) with Gemini Flash 2.0 to κ 0.39 (0.20-0.59) with DeepSeek v3. For RoB 2, agreement was lower, from κ 0.06 (-0.07 to 0.18) with ChatGPT o3 to κ 0.13 (-0.04 to 0.31) with Gemini. Diagnostic performance was limited with sensitivity ranging 0.05-0.55, specificity 0.78-0.99, PPV 0.31-0.50, and NPV 0.48-0.61 across models, with models consistently over-flagging concerns. INTERPRETATION: None of the evaluated LLMs were sufficiently reliable for fully autonomous RoB assessment. DeepSeek v3 and ChatGPT o3 approximated human performance best on RoB 1, but RoB 2 rule-in and rule-out performance remained modest. Current use should be supervised, with possible application of LLMs for triage or as a second assessor. Major improvements in protocol retrieval, task-specific tuning, and calibrated thresholds, prospectively validated, are needed for safe stand-alone deployment. FUNDING: This study received no financial support.
Ähnliche Arbeiten
Applied logistic regression
1990 · 35.656 Zit.
The central role of the propensity score in observational studies for causal effects
1983 · 30.694 Zit.
SPSS and SAS procedures for estimating indirect effects in simple mediation models
2004 · 17.101 Zit.
A Proportional Hazards Model for the Subdistribution of a Competing Risk
1999 · 13.495 Zit.
Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models
1982 · 12.616 Zit.