OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 01:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

P0298 Revolutionizing Diagnostics: Evaluating ChatGPT-4’s Performance in Ulcerative Colitis Endoscopic Assessment

2025·0 Zitationen·Journal of Crohn s and ColitisOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

Abstract Background The Mayo Endoscopic Subscore (MES) is a widely utilized measure for assessing endoscopic disease activity in ulcerative colitis (UC). This assessment has a key role in clinical practice as endoscopic remission is a long-term therapeutic objective. Artificial intelligence has emerged as a promising tool for enhancing diagnostic precision and addressing inter-observer variability among endoscopists. This study aims to evaluate the diagnostic accuracy of ChatGPT-4, a multimodal large language model (LLM), in identifying and grading endoscopic images of UC patients using the MES as a reference standard, without prior configuration or fine-tuning. Methods Real-world endoscopic images of UC patients were obtained for severity assessment and reviewed by an expert consensus board. Each image was classified by severity grade (0-3) based on the MES. Only images that were uniformly graded by the consensus board were subsequently provided to three IBD specialists and ChatGPT-4 in three separate sessions. Severity gradings of the IBD specialists and ChatGPT-4 were compared with assessments made by the expert consensus board. Results Fifty endoscopic images were initially evaluated by the expert consensus board. Of those, 30 images (60%) were graded with complete agreement of MES among the experts. Compared to the consensus board, ChatGPT4’s MES gradings were accurate in 26/30 (86.7%), 21/30 (70%) and 24/30 (80%) with a mean accuracy rate of 78.9%. The IBD specialists gradings were accurate in 24/30 (80%), 24/30 (80%) and 25/30 (83.3%) with a mean accuracy rate of 81.1% (figure 1). There was no statistically significant difference in mean accuracy rates between the two groups (p = 0.71). Conclusion ChatGPT-4 has the potential of assessing mucosal inflammation severity from endoscopic images of UC patients, without prior configuration or fine-tuning. Performance rates were comparable to IBD specialists. Further research and validation are warranted to explore the broader applications of LLMs and their integration into diagnostic workflows.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical Imaging
Volltext beim Verlag öffnen