OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.04.2026, 17:09

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT-5 versus other mainstream large language models in core diabetic retinopathy patient queries

2026·0 Zitationen·Frontiers in Cell and Developmental BiologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Background Diabetic retinopathy is a leading cause of preventable vision loss, and patients increasingly seek disease related information through online consultations. Large language models may support patient education, but their reliability and usability vary across systems, particularly in disease specific settings. Methods Thirty common patient questions about diabetic retinopathy were developed from guidelines and organized into five domains: disease overview, screening and diagnosis, treatment and follow up, lifestyle and prevention, and prognosis and complication management. From November 10 to 15, 2025, two researchers independently submitted all questions to five models (ChatGPT-5, DeepSeek-V3.1, Doubao, Wenxinyiyan 4.5 Turbo, and Kimi) on public platforms under identical conditions without system prompts. Chat histories were reset before each question. Response time, response length, structural metrics, and table outputs were extracted. Two retinal specialists rated each answer on a 1 to 5 Likert scale across accuracy, logical consistency, coherence, safety, and content accessibility. Inter rater agreement was assessed with the intraclass correlation coefficient. Group differences were analyzed using analysis of variance or the Kruskal–Wallis H test with Bonferroni corrected pairwise comparisons. Results Significant between model differences were observed in output efficiency and textual characteristics (all P < 0.001). ChatGPT-5 responded fastest (15.92 ± 4.48 s), whereas Wenxinyiyan 4.5 Turbo and DeepSeek-V3.1 were slowest (41.89 ± 5.09 s and 38.20 ± 2.96 s). DeepSeek-V3.1 generated the longest answers (1396.37 ± 189.23 words), while Kimi produced the shortest (579.40 ± 182.96 words). Only ChatGPT-5 consistently generated structured tables (median 2.00, IQR 1.00-2.00). Content quality differed significantly across all five dimensions (H = 15.34-37.19, all P ≤ 0.004). ChatGPT-5 achieved the highest median scores for accuracy (5.00, IQR 4.00-5.00) and logical consistency (4.50, IQR 4.00-5.00), whereas Kimi showed the lowest accuracy (3.50, IQR 3.00-4.00). The intraclass correlation coefficient indicated good inter rater reliability (0.87). Conclusion Performance of large language models in diabetic retinopathy patient consultations is model dependent. ChatGPT-5 demonstrated the best overall usability, combining faster responses, clearer structure, and higher factual accuracy. Other Chinese optimized models provided comparable professional information coverage but require improved accessibility and stability for safe patient facing use.

Ähnliche Arbeiten