Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence‐generated patient information on shoulder instability remains suboptimal: DeepSeek outperforms ChatGPT in completeness of content while ChatGPT is more readable
0
Zitationen
3
Autoren
2026
Jahr
Abstract
PURPOSE: This study aimed to evaluate and compare the performance of the Chat Generative Pre-Trained Transformer (ChatGPT) and DeepSeek artificial intelligence (AI) models for patient information on shoulder instability. METHODS: Sixteen frequently asked questions related to shoulder instability were posed to both AI models. The models' responses were evaluated for content quality using the Journal of the American Medical Association (JAMA), DISCERN, and 4-point Likert scales. In addition, the readability of the responses was analysed using the Flesch-Kincaid Readability Score (FRES) and Flesch-Kincaid Grade Level (FKGL). RESULTS: None of the models met the JAMA criteria. In the DISCERN scoring, DeepSeek (52.81) scored significantly higher than ChatGPT (48.5) (p = 0.001). While there was no significant difference in the accuracy, clarity, and consistency criteria between the two models in the 4-point Likert evaluation (p > 0.05), DeepSeek scored significantly higher than ChatGPT in the completeness criterion (p = 0.001). In terms of readability, ChatGPT had an average FKGL value of 7.78 and an FRES score of 52.44. The DeepSeek model had an FKGL value of 9.90 and an FRES score of 41.87. There was a statistically significant difference in the readability between the two models (FKGL, p = 0.016; FRES, p = 0.015). CONCLUSION: Both AI models provided generally accurate and clinically relevant information on shoulder instability patient education despite limitations in transparency and source attribution. The results showed that DeepSeek scored significantly higher in DISCERN and the completeness criterion of the 4-point Likert scale, while there was no significant difference in accuracy, clarity, and consistency. ChatGPT demonstrated better readability. These findings suggest that AI models have the potential to be tools for patient information on shoulder instability, with each model having different strengths. LEVEL OF EVIDENCE: Level V.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.