Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Harnessing ChatGPT for abstract screening in health-related scoping reviews: the role of structured eligibility criteria
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Scoping reviews, especially the screening phase, are time-consuming. Due to an exponential increase in the number of scientific articles related to health issues being published each year, the time burden is also expected to rise. In parallel, there is a gap in machine learning tools specifically tailored to support screening processes within scoping reviews. Large language models, such as ChatGPT, have shown promise in accelerating this process. Given the reliance of artificial intelligence on structured inputs, this study aims to refine eligibility criteria and analyse ChatGPT’s performance in abstract screening for a scoping review on digital tools supporting interprofessional interactions in healthcare. We conducted a thematic analysis of ChatGPT 4.0’s explanations based on a previously conducted scoping review and developed three refined sets of eligibility criteria (narrow, wide, and balanced). ChatGPT reassessed these criteria using the human reviewers’ decisions as the gold standard. We calculated performance metrics such as sensitivity, specificity, and accuracy. Additionally, we combined decisions using majority voting and human conflict resolution to assess their combined performance. Rephrasing the eligibility criteria using the Population-Concept-Context framework revealed challenges in each category: the complexity of healthcare provider interactions (population), ambiguities in defining digital tools (concept), and difficulties in identifying the healthcare setting (context). For example, ChatGPT was sometimes overinclusive, including individuals such as politicians, while at other times applying overly rigid boundaries, excluding healthcare groups like palliative care providers. The wide set of eligibility criteria achieved the highest sensitivity, while the narrow set had the highest specificity, with the original and balanced sets performing between these extremes. Combining decisions provided a good balance between sensitivity and specificity, effectively identifying ambiguous abstracts for manual review. The combination of balanced and narrow criteria resulted in a lower overall workload compared to other combinations. Refining and structuring the eligibility criteria as well as combining different sets improved ChatGPT’s decision-making accuracy, highlighting the importance of well-defined and well-formulated criteria for such tasks. The formulation of eligibility criteria directly impacted screening outcomes: broad criteria maximized sensitivity at the expense of specificity, while narrow criteria produced the opposite effect. Combining different approaches may enable screening by one human reviewer accompanied by an AI-assisted screening for the major part and to include a second human reviewer in cases of discrepancies. However, challenges such as trust and transparency persist and need to be addressed for full integration into the review process.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.