Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating ChatGPT-4 for rheumatology patient education: a comparative analysis of readability, reliability, and similarity to the American College of Rheumatology’s fact sheets

2025·2 Zitationen·Reumatologia/RheumatologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Introduction This study aimed to evaluate the readability, quality, reliability, similarity, and length of texts generated by ChatGPT on common rheumatic diseases and compare their content with American College of Rheumatology (ACR) patient education fact sheets. Material and methods Fifteen common rheumatic diseases were included based on the ACR fact sheets. Questions about disease characteristics, symptoms, treatments, and lifestyle recommendations were generated based on ACR content and input into ChatGPT-4 for comparison. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE), and the Simple Measure of Gobbledygook (SMOG) index. Quality and reliability were evaluated using the DISCERN questionnaire and the Ensuring Quality Information for Patients (EQIP) tool. Text similarity was measured using cosine similarity, and word count was obtained using Microsoft Word. Results ChatGPT-generated texts had significantly higher FKGL scores (14.3 vs. 12.7; p = 0.007) and SMOG scores (p < 0.001), indicating greater linguistic complexity. They also had lower FRE scores (35.8 vs. 43.7; p < 0.001). The mean DISCERN score for ChatGPT was significantly lower than for ACR fact sheets (46 vs. 52; p < 0.001), suggesting reduced reliability. However, no significant difference was found in EQIP quality scores (p = 0.744). Cosine similarity between ChatGPT and ACR texts averaged 0.69 (range: 0.57–0.76), indicating moderate content overlap. ChatGPT texts were more than twice as long, with a median word count of 1,109 compared to 450 for ACR materials (p < 0.001). Conclusions Despite the moderate similarity, ChatGPT-generated texts on rheumatic diseases were more complex, less reliable, and longer than ACR fact sheets. These findings highlight the need for improvements in artificial intelligence-driven healthcare tools to ensure readability, accuracy, and reliability, making them more aligned with expert-reviewed resources.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRheumatoid Arthritis Research and Therapies

Volltext beim Verlag öffnen

Evaluating ChatGPT-4 for rheumatology patient education: a comparative analysis of readability, reliability, and similarity to the American College of Rheumatology’s fact sheets

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen