Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Question–Answer Readability Alignment in Patient Education: A Comparative Evaluation of Prompting Strategies in Large Language Models (Preprint)

2026·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

<sec> <title>BACKGROUND</title> Accessible health communication is essential for effective patient education, yet most medical information remains written at a level above the general population's recommended reading level. Large language models (LLMs) can simplify complex medical information, but it remains unclear whether responses can be automatically adapted to match the linguistic complexity of patient questions. </sec> <sec> <title>OBJECTIVE</title> To evaluate whether the LLM-generated responses to patient questions align with their reading level and to compare different prompting strategies for readability control across multiple LLM architectures, including their impact on response quality as well as computational cost and efficiency. </sec> <sec> <title>METHODS</title> 400 real patient questions collected from public YouTube comments on ophthalmology-related topics were categorized into three readability tiers based on the Flesch Reading Ease Score. A controlled comparative evaluation of responses generated by four LLMs (GPT 5.2, Gemini 3, Claude 4.5 Opus, and OSS-20b) was performed under four prompting conditions: Baseline (no readability control), Formula-Informed Readability Prompt, Semantic Readability Inference, and External Calculator–Guided Prompt. Agreement between question and response readability was assessed using quadratic-weighted Cohen's kappa. Response quality was evaluated using BERTScore and LLM-as-a-judge assessments. Computational efficiency was evaluated through token usage, estimated cost per response, and latency. </sec> <sec> <title>RESULTS</title> The external calculator–guided strategy produced the strongest alignment in readability across models, achieving the highest agreement (Gemini 3: κ=0.94; GPT 5.2: κ=0.90) and the most favorable computational efficiency. Formula-informed prompting showed intermediate performance, whereas semantic inference demonstrated poor agreement due to unreliable readability estimation. High-performing models (GPT 5.2 and Gemini 3) maintained very high factual accuracy (>98%) across strategies, indicating that readability control did not substantially degrade correctness. In contrast, the open-source OSS-20b model showed lower accuracy (83–90%) and greater sensitivity to readability constraints. Semantic similarity remained high across models (BERTScore F1 >0.87). Longer questions were associated with slightly lower accuracy, particularly in lower-performing configurations. </sec> <sec> <title>CONCLUSIONS</title> Aligning LLM responses to the readability of patient questions is feasible and can be reliably achieved using externally guided readability signals. Hybrid pipelines that combine traditional readability metrics with LLM-generated content may therefore represent a practical approach for scalable, patient-centered health communication systems. </sec>

Autoren

Themen

Health Literacy and Information AccessibilityArtificial Intelligence in Healthcare and EducationText Readability and Simplification

Volltext beim Verlag öffnen

Question–Answer Readability Alignment in Patient Education: A Comparative Evaluation of Prompting Strategies in Large Language Models (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen