Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating deepresearch and deepthink in total knee arthroplasty patient education: ChatGPT-4o excels in comprehensiveness, Deepseek R1 leads in clarity and readability of orthopedic information
0
Zitationen
8
Autoren
2026
Jahr
Abstract
OBJECTIVES: This study aims to directly compare ChatGPT and DeepSeek, both equipped with DeepResearch/DeepThink capabilities, based on their responses to frequently asked questions (FAQs) on total knee arthroplasty (TKA). MATERIALS AND METHODS: Thirty frequently asked questions related to TKA were compiled from validated patient education sources, including American Academy of Orthopaedic Surgeons (AAOS) OrthoInfo, National Institute for Health and Care Excellence (NICE) guidelines, and popular patient discussion forums, and verified for clinical relevance by two independent arthroplasty surgeons. Two orthopedic surgeons, blinded to model identity, evaluated each response using a five-point Likert scale across five domains: accuracy, comprehensiveness, readability, relevance, and ethical and safety considerations. The maximum total score per response was 25. Readability was also assessed using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). Inter-rater and intra-rater reliability were calculated using intraclass correlation coefficients (ICCs). RESULTS: The ChatGPT-4o scored significantly higher in comprehensiveness and clinical detail, whereas DeepSeek R1 produced responses with superior readability, indicated by a lower FKGL (7.5 vs. 10.2) and higher FRES (62.3 vs. 45.6) (p < 0.05). Both models demonstrated high accuracy and safety, with no factual errors identified. Intra-rater reliability was excellent (ICC > 0.81), and inter-rater agreement ranged from fair to substantial (ICC 0.31 to 0.63). CONCLUSION: Both ChatGPT-4o and DeepSeek R1 are capable of generating accurate, ethically sound, and clinically relevant educational content for patients undergoing TKA. While ChatGPT-4o offers more comprehensive information, DeepSeek R1 provides content that is more accessible to patients with lower health literacy. Model selection should be tailored to the target population to optimize educational effectiveness in clinical practice. The ability of real-time data retrieval to incorporate the most current clinical evidence and guideline updates may further enhance the educational quality, reliability, and clinical relevance of AI-generated patient information.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.553 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.444 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.943 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.