OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.05.2026, 16:03

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating deepresearch and deepthink in total knee arthroplasty patient education: ChatGPT-4o excels in comprehensiveness, Deepseek R1 leads in clarity and readability of orthopedic information

2026·0 Zitationen·Joint Diseases and Related SurgeryOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

OBJECTIVES: This study aims to directly compare ChatGPT and DeepSeek, both equipped with DeepResearch/DeepThink capabilities, based on their responses to frequently asked questions (FAQs) on total knee arthroplasty (TKA). MATERIALS AND METHODS: Thirty frequently asked questions related to TKA were compiled from validated patient education sources, including American Academy of Orthopaedic Surgeons (AAOS) OrthoInfo, National Institute for Health and Care Excellence (NICE) guidelines, and popular patient discussion forums, and verified for clinical relevance by two independent arthroplasty surgeons. Two orthopedic surgeons, blinded to model identity, evaluated each response using a five-point Likert scale across five domains: accuracy, comprehensiveness, readability, relevance, and ethical and safety considerations. The maximum total score per response was 25. Readability was also assessed using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). Inter-rater and intra-rater reliability were calculated using intraclass correlation coefficients (ICCs). RESULTS: The ChatGPT-4o scored significantly higher in comprehensiveness and clinical detail, whereas DeepSeek R1 produced responses with superior readability, indicated by a lower FKGL (7.5 vs. 10.2) and higher FRES (62.3 vs. 45.6) (p < 0.05). Both models demonstrated high accuracy and safety, with no factual errors identified. Intra-rater reliability was excellent (ICC > 0.81), and inter-rater agreement ranged from fair to substantial (ICC 0.31 to 0.63). CONCLUSION: Both ChatGPT-4o and DeepSeek R1 are capable of generating accurate, ethically sound, and clinically relevant educational content for patients undergoing TKA. While ChatGPT-4o offers more comprehensive information, DeepSeek R1 provides content that is more accessible to patients with lower health literacy. Model selection should be tailored to the target population to optimize educational effectiveness in clinical practice. The ability of real-time data retrieval to incorporate the most current clinical evidence and guideline updates may further enhance the educational quality, reliability, and clinical relevance of AI-generated patient information.

Ähnliche Arbeiten