OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.05.2026, 23:08

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Can artificial intelligence improve the readability of patient education information in gynecology?

2025·10 Zitationen·American Journal of Obstetrics and GynecologyOpen Access
Volltext beim Verlag öffnen

10

Zitationen

4

Autoren

2025

Jahr

Abstract

BACKGROUND: The American Medical Association recommends that patient information be written at a sixth-grade level to increase accessibility. However, most existing patient education materials exceed this threshold, posing challenges to patient comprehension. Artificial intelligence, particularly large language models, presents an opportunity to improve the readability of medical information. Despite the growing integration of artificial intelligence in healthcare, few studies have evaluated the effectiveness of large language models in generating or improving readability of existing patient education materials within gynecology. OBJECTIVE: To assess the readability and effectiveness of patient education materials generated by ChatGPT, Gemini, and CoPilot compared to American College of Obstetricians and Gynecologists and UpToDate.com. Additionally, to determine whether these large language models can successfully adjust the reading level to a sixth-grade standard. STUDY DESIGN: This cross-sectional study analyzed American College of Obstetricians and Gynecologists, UpToDate, and large language model-generated content, evaluating large language models for 2 tasks: 1) independent large language model-generated materials and 2) large language model-enhanced versions reducing existing patient information to sixth-grade level. All materials were assessed for basic textual analysis and readability using 8 readability formulas. Two board-certified obstetrician-gynecologists evaluated blinded patient education materials for accuracy, clarity, and comprehension. Analysis of variance was used to compare textual analysis and readability scores, with Tukey post-hoc tests identifying differences for both original and enhanced materials. An alpha threshold of P<.004 was used to account for multiple comparisons. RESULTS: Large language model-generated materials were significantly shorter (mean word count 407.9 vs 1132.0; P<.001) but had a higher proportion of difficult words (36.7% vs 27.4%; P<.001). American College of Obstetricians and Gynecologists and UpToDate materials averaged ninth-grade and 8.6-grade levels, respectively, while artificial intelligence-generated content reached a 10.6-grade level (P = .008). Although CoPilot and Gemini improved readability when prompted, no large language model successfully reached the sixth-grade benchmark, and ChatGPT increased reading difficulty. CONCLUSION: Large language models generated more concise patient education materials but often introduced more complex vocabulary, ultimately failing to meet recommended health literacy standards. Even when explicitly prompted, no large language model achieved the sixth-grade reading level required for optimal patient comprehension. Without proper oversight, artificial intelligence-generated patient education materials may create the illusion of simplicity while reducing true accessibility. Future efforts should focus on integrating health literacy safeguards into artificial intelligence models before clinical implementation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHealth Literacy and Information AccessibilitySimulation-Based Education in Healthcare
Volltext beim Verlag öffnen