Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmark analysis of myopia-related issues using large language models: a comparison of ChatGPT-4o and deepseek

2025·1 Zitationen·BMC OphthalmologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVE: This study evaluated the accuracy and comprehensiveness of responses generated by ChatGPT-4o and DeepSeek regarding commonly asked questions about myopia. METHODS: Thirty myopia-related questions spanning six clinical domains were submitted to both chatbots. Three medical professionals independently rated each response for accuracy and comprehensiveness. Inter-rater reliability was assessed using Fleiss' Kappa, and Shapiro-Wilk tests were conducted to examine normality in rating distributions. Statistical comparisons were performed using the Chi-square test, with significance set at p < 0.05. RESULTS: DeepSeek outperformed ChatGPT-4o in overall accuracy, with significantly more responses rated as "Good" (p < 0.0001). Both models demonstrated high comprehensiveness scores when accuracy was rated "Good," though performance declined in treatment-related queries, particularly regarding commercial products like DIMS lenses. Fleiss' Kappa values indicated poor inter-rater agreement (DeepSeek: [Formula: see text] = 0.106; ChatGPT-4o: [Formula: see text] = - 0.0221), and normality tests showed non-normal score distributions (p < 0.0001 across domains). CONCLUSION: Both ChatGPT-4o and DeepSeek can deliver useful responses to myopia-related questions, though limitations remain in areas requiring up-to-date, region-specific treatment information. DeepSeek's stronger performance suggests that localized LLMs may offer competitive advantages. Ongoing refinement, regular data updates, and domain-specific fine-tuning are essential for improving the reliability of AI chatbots in clinical communication.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareDigital Mental Health Interventions

Volltext beim Verlag öffnen

Benchmark analysis of myopia-related issues using large language models: a comparison of ChatGPT-4o and deepseek

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen