Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of bias in scoring of AI-based radiotherapy segmentation and planning studies using modified TRIPOD and PROBAST guidelines as an example
10
Zitationen
7
Autoren
2024
Jahr
Abstract
BACKGROUND AND PURPOSE: Studies investigating the application of Artificial Intelligence (AI) in the field of radiotherapy exhibit substantial variations in terms of quality. The goal of this study was to assess the amount of transparency and bias in scoring articles with a specific focus on AI based segmentation and treatment planning, using modified PROBAST and TRIPOD checklists, in order to provide recommendations for future guideline developers and reviewers. MATERIALS AND METHODS: The TRIPOD and PROBAST checklist items were discussed and modified using a Delphi process. After consensus was reached, 2 groups of 3 co-authors scored 2 articles to evaluate usability and further optimize the adapted checklists. Finally, 10 articles were scored by all co-authors. Fleiss' kappa was calculated to assess the reliability of agreement between observers. RESULTS: Three of the 37 TRIPOD items and 5 of the 32 PROBAST items were deemed irrelevant. General terminology in the items (e.g., multivariable prediction model, predictors) was modified to align with AI-specific terms. After the first scoring round, further improvements of the items were formulated, e.g., by preventing the use of sub-questions or subjective words and adding clarifications on how to score an item. Using the final consensus list to score the 10 articles, only 2 out of the 61 items resulted in a statistically significant kappa of 0.4 or more demonstrating substantial agreement. For 41 items no statistically significant kappa was obtained indicating that the level of agreement among multiple observers is due to chance alone. CONCLUSION: Our study showed low reliability scores with the adapted TRIPOD and PROBAST checklists. Although such checklists have shown great value during development and reporting, this raises concerns about the applicability of such checklists to objectively score scientific articles for AI applications. When developing or revising guidelines, it is essential to consider their applicability to score articles without introducing bias.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.687 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.591 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.114 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.867 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- Radboud University Nijmegen(NL)
- Catharina Ziekenhuis(NL)
- Eindhoven University of Technology(NL)
- Université Paris Cité(FR)
- Hôpital Européen Georges-Pompidou(FR)
- European Organisation for Research and Treatment of Cancer(BE)
- Université Libre de Bruxelles(BE)
- Institut Jules Bordet(BE)
- Maastricht University(NL)
- Maastro Clinic(NL)
- University of Zurich(CH)
- University Hospital of Zurich(CH)