Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Artificial intelligence‐generated patient information on shoulder instability remains suboptimal: DeepSeek outperforms ChatGPT in completeness of content while ChatGPT is more readable

2026·0 Zitationen·Knee Surgery Sports Traumatology Arthroscopy

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

PURPOSE: This study aimed to evaluate and compare the performance of the Chat Generative Pre-Trained Transformer (ChatGPT) and DeepSeek artificial intelligence (AI) models for patient information on shoulder instability. METHODS: Sixteen frequently asked questions related to shoulder instability were posed to both AI models. The models' responses were evaluated for content quality using the Journal of the American Medical Association (JAMA), DISCERN, and 4-point Likert scales. In addition, the readability of the responses was analysed using the Flesch-Kincaid Readability Score (FRES) and Flesch-Kincaid Grade Level (FKGL). RESULTS: None of the models met the JAMA criteria. In the DISCERN scoring, DeepSeek (52.81) scored significantly higher than ChatGPT (48.5) (p = 0.001). While there was no significant difference in the accuracy, clarity, and consistency criteria between the two models in the 4-point Likert evaluation (p > 0.05), DeepSeek scored significantly higher than ChatGPT in the completeness criterion (p = 0.001). In terms of readability, ChatGPT had an average FKGL value of 7.78 and an FRES score of 52.44. The DeepSeek model had an FKGL value of 9.90 and an FRES score of 41.87. There was a statistically significant difference in the readability between the two models (FKGL, p = 0.016; FRES, p = 0.015). CONCLUSION: Both AI models provided generally accurate and clinically relevant information on shoulder instability patient education despite limitations in transparency and source attribution. The results showed that DeepSeek scored significantly higher in DISCERN and the completeness criterion of the 4-point Likert scale, while there was no significant difference in accuracy, clarity, and consistency. ChatGPT demonstrated better readability. These findings suggest that AI models have the potential to be tools for patient information on shoulder instability, with each model having different strengths. LEVEL OF EVIDENCE: Level V.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationShoulder Injury and TreatmentExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Artificial intelligence‐generated patient information on shoulder instability remains suboptimal: DeepSeek outperforms ChatGPT in completeness of content while ChatGPT is more readable

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen