Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating ChatGPT-4 for rheumatology patient education: a comparative analysis of readability, reliability, and similarity to the American College of Rheumatology’s fact sheets
2
Zitationen
3
Autoren
2025
Jahr
Abstract
Introduction This study aimed to evaluate the readability, quality, reliability, similarity, and length of texts generated by ChatGPT on common rheumatic diseases and compare their content with American College of Rheumatology (ACR) patient education fact sheets. Material and methods Fifteen common rheumatic diseases were included based on the ACR fact sheets. Questions about disease characteristics, symptoms, treatments, and lifestyle recommendations were generated based on ACR content and input into ChatGPT-4 for comparison. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease (FRE), and the Simple Measure of Gobbledygook (SMOG) index. Quality and reliability were evaluated using the DISCERN questionnaire and the Ensuring Quality Information for Patients (EQIP) tool. Text similarity was measured using cosine similarity, and word count was obtained using Microsoft Word. Results ChatGPT-generated texts had significantly higher FKGL scores (14.3 vs. 12.7; p = 0.007) and SMOG scores (p < 0.001), indicating greater linguistic complexity. They also had lower FRE scores (35.8 vs. 43.7; p < 0.001). The mean DISCERN score for ChatGPT was significantly lower than for ACR fact sheets (46 vs. 52; p < 0.001), suggesting reduced reliability. However, no significant difference was found in EQIP quality scores (p = 0.744). Cosine similarity between ChatGPT and ACR texts averaged 0.69 (range: 0.57–0.76), indicating moderate content overlap. ChatGPT texts were more than twice as long, with a median word count of 1,109 compared to 450 for ACR materials (p < 0.001). Conclusions Despite the moderate similarity, ChatGPT-generated texts on rheumatic diseases were more complex, less reliable, and longer than ACR fact sheets. These findings highlight the need for improvements in artificial intelligence-driven healthcare tools to ensure readability, accuracy, and reliability, making them more aligned with expert-reviewed resources.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.