Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

2026·0 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background: Diabetes, obesity, and hypertension are common chronic conditions in which the role of lifestyle alteration is central to control. Education materials may accompany these interventions, but will only be helpful if clear and credible. Since the development of large language models (LLMs) like ChatGPT and Google Gemini, the potential of contributing to the production of health education materials should be seriously taken into account. Aim: The aim of this study is to conduct a cross-sectional comparison of five LLMs (ChatGPT-4o, Google Gemini 2.5, Claude Sonnet 4, Grok 3, and Perplexity) in generating patient education brochures on diet and exercise for diabetes, hypertension, and obesity, evaluating their readability, originality, and reliability. Primary objective: The primary objective is to compare the readability and reliability of AI-generated patient education materials. Secondary objective: The secondary objective is to assess lexical complexity and originality of generated content. Methods: This cross-sectional study used standardized questions to generate brochures based on each response provided by the LLMs. Outputs were evaluated for readability (Flesch-Kincaid test), word complexity, novelty (PapersOwl plagiarism software), and consistency (modified DISCERN instrument). Descriptive statistics and one-way ANOVA were used, where p < 0.05 was deemed significant. Results: ChatGPT produced the shortest and most readable content, evidenced by its lowest grade level (5.2 ± 0.8) and highest Flesch Reading Ease rating (70.0 ± 5.1). Gemini and Claude produced longer, more elaborate brochures that received higher reliability ratings (3.0 ± 0.0 and 3.0 ± 1.0, respectively) but were at higher reading levels (≈ 9th grade). Grok obtained medium for all the measures, while Perplexity produced shorter responses (≈ 444 words) but the lowest reliability score (1.3 ± 0.6). There were no major differences in originality scores across the tools. Conclusion: Every model had a strength: ChatGPT in readability, Gemini and Claude in reliability, Grok in balance, and Perplexity in conciseness. Every model demonstrated at least one parameter where it outperformed the others. The results validate LLMs in terms of producing patient-comprehensible leaflets, but human editing, updation with the latest guidelines, and human supervision will be needed before clinical application.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationHealth Literacy and Information AccessibilitySocial Media in Health Education

Volltext beim Verlag öffnen

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

Abstract

Ähnliche Arbeiten

Autoren

Themen