Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A new peer reviewer? Comparing AI with human performance in randomized controlled trial risk-of-bias assessment
0
Zitationen
6
Autoren
2026
Jahr
Abstract
BACKGROUND: Risk-of-bias (RoB) assessment is essential for evidence synthesis but remains time-consuming and inherently subjective. Artificial intelligence (AI) may improve the efficiency of systematic reviews; however, its reliability in reproducing expert RoB judgements remains uncertain. OBJECTIVES: To compare the performance of AI models and human raters in RoB assessment of randomized controlled trials (RCTs) using the revised Joanna Briggs Institute (JBI) critical appraisal tool. MATERIAL AND METHODS: Thirteen RCTs published between 2023 and 2025 in orthopedic journals were independently assessed by 2 human raters (an expert (R1) and a novice (R2)) and 2 AI models (ChatGPT-4.0 (CGPT) and DeepSeek-R1 (DS)) using the 13-domain JBI checklist. Deep-reasoning functionalities (e.g., chain-of-thought prompting) were applied. Inter-rater agreement, deviations from the expert assessment (reference standard), and binary disagreements (e.g., Yes vs No) were analyzed to evaluate consistency. RESULTS: The AI models demonstrated high inter-model agreement (91%), exceeding human-AI agreement (CGPT vs R1: 64%; DS vs R1: 68%). However, both AI systems showed substantial divergence from expert judgements in interpretive domains, including allocation concealment (Q2), blinding (Q7), and overall trial design (Q13), with deviation rates ranging from 30% to 38.5%. Binary decision reversals were more frequent in AI assessments (CGPT: 8.9%; DS: 7.7%) than in the human comparison (R2 vs R1: 2.4%). Human raters showed stronger agreement in contextual interpretation (R1-R2: 89.3%), whereas AI models performed better in rule-based domains (Q8/Q9: 100% agreement). CONCLUSIONS: AI can reliably support the automation of objective components of RoB assessment but remains limited in handling interpretive, context-dependent judgements. A hybrid approach combining AI-assisted pre-screening with expert evaluation may enhance the scalability of systematic reviews without compromising methodological rigor.
Ähnliche Arbeiten
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
2021 · 89.315 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 83.027 Zit.
The Measurement of Observer Agreement for Categorical Data
1977 · 77.767 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 63.381 Zit.
Measuring inconsistency in meta-analyses
2003 · 62.051 Zit.