Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
P0298 Revolutionizing Diagnostics: Evaluating ChatGPT-4’s Performance in Ulcerative Colitis Endoscopic Assessment
0
Zitationen
7
Autoren
2025
Jahr
Abstract
Abstract Background The Mayo Endoscopic Subscore (MES) is a widely utilized measure for assessing endoscopic disease activity in ulcerative colitis (UC). This assessment has a key role in clinical practice as endoscopic remission is a long-term therapeutic objective. Artificial intelligence has emerged as a promising tool for enhancing diagnostic precision and addressing inter-observer variability among endoscopists. This study aims to evaluate the diagnostic accuracy of ChatGPT-4, a multimodal large language model (LLM), in identifying and grading endoscopic images of UC patients using the MES as a reference standard, without prior configuration or fine-tuning. Methods Real-world endoscopic images of UC patients were obtained for severity assessment and reviewed by an expert consensus board. Each image was classified by severity grade (0-3) based on the MES. Only images that were uniformly graded by the consensus board were subsequently provided to three IBD specialists and ChatGPT-4 in three separate sessions. Severity gradings of the IBD specialists and ChatGPT-4 were compared with assessments made by the expert consensus board. Results Fifty endoscopic images were initially evaluated by the expert consensus board. Of those, 30 images (60%) were graded with complete agreement of MES among the experts. Compared to the consensus board, ChatGPT4’s MES gradings were accurate in 26/30 (86.7%), 21/30 (70%) and 24/30 (80%) with a mean accuracy rate of 78.9%. The IBD specialists gradings were accurate in 24/30 (80%), 24/30 (80%) and 25/30 (83.3%) with a mean accuracy rate of 81.1% (figure 1). There was no statistically significant difference in mean accuracy rates between the two groups (p = 0.71). Conclusion ChatGPT-4 has the potential of assessing mucosal inflammation severity from endoscopic images of UC patients, without prior configuration or fine-tuning. Performance rates were comparable to IBD specialists. Further research and validation are warranted to explore the broader applications of LLMs and their integration into diagnostic workflows.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.393 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.259 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.688 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.502 Zit.