OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 07:32

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Harnessing ChatGPT for abstract screening in health-related scoping reviews: the role of structured eligibility criteria

2025·0 Zitationen·BMC Health Services ResearchOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2025

Jahr

Abstract

Scoping reviews, especially the screening phase, are time-consuming. Due to an exponential increase in the number of scientific articles related to health issues being published each year, the time burden is also expected to rise. In parallel, there is a gap in machine learning tools specifically tailored to support screening processes within scoping reviews. Large language models, such as ChatGPT, have shown promise in accelerating this process. Given the reliance of artificial intelligence on structured inputs, this study aims to refine eligibility criteria and analyse ChatGPT’s performance in abstract screening for a scoping review on digital tools supporting interprofessional interactions in healthcare. We conducted a thematic analysis of ChatGPT 4.0’s explanations based on a previously conducted scoping review and developed three refined sets of eligibility criteria (narrow, wide, and balanced). ChatGPT reassessed these criteria using the human reviewers’ decisions as the gold standard. We calculated performance metrics such as sensitivity, specificity, and accuracy. Additionally, we combined decisions using majority voting and human conflict resolution to assess their combined performance. Rephrasing the eligibility criteria using the Population-Concept-Context framework revealed challenges in each category: the complexity of healthcare provider interactions (population), ambiguities in defining digital tools (concept), and difficulties in identifying the healthcare setting (context). For example, ChatGPT was sometimes overinclusive, including individuals such as politicians, while at other times applying overly rigid boundaries, excluding healthcare groups like palliative care providers. The wide set of eligibility criteria achieved the highest sensitivity, while the narrow set had the highest specificity, with the original and balanced sets performing between these extremes. Combining decisions provided a good balance between sensitivity and specificity, effectively identifying ambiguous abstracts for manual review. The combination of balanced and narrow criteria resulted in a lower overall workload compared to other combinations. Refining and structuring the eligibility criteria as well as combining different sets improved ChatGPT’s decision-making accuracy, highlighting the importance of well-defined and well-formulated criteria for such tasks. The formulation of eligibility criteria directly impacted screening outcomes: broad criteria maximized sensitivity at the expense of specificity, while narrow criteria produced the opposite effect. Combining different approaches may enable screening by one human reviewer accompanied by an AI-assisted screening for the major part and to include a second human reviewer in cases of discrepancies. However, challenges such as trust and transparency persist and need to be addressed for full integration into the review process.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics and Social Impacts of AIExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen