Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the efficacy of large language models in cardio-oncology patient education: a comparative analysis of accuracy, readability, and prompt engineering strategies
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Background The integration of large language models (LLMs) into cardio-oncology patient education holds promise for addressing the critical gap in accessible, accurate, and patient-friendly information. However, the performance of publicly available LLMs in this specialized domain remains underexplored. Objectives This study evaluates the performance of three LLMs (ChatGPT-4, Kimi, DouBao) act as assistants for physicians in cardio-oncology patient education and examines the impact of prompt engineering on response quality. Methods Twenty standardized questions spanning cardio-oncology topics were posed twice to three LLMs (ChatGPT-4, Kimi, DouBao): once without prompts and once with a directive to simplify language, generating 240 responses. These responses were evaluated by four cardio-oncology specialists for accuracy, comprehensiveness, helpfulness, and practicality. Readability and complexity were assessed using a Chinese text analysis framework. Results Among 240 responses, 63.3% were rated “correct,” 35.0% “partially correct,” and 1.7% “incorrect.” No significant differences in accuracy were observed between models ( p = 0.26). Kimi demonstrated no incorrect responses. Significant declines in comprehensiveness ( p = 0.03) and helpfulness ( p < 0.01) occurred post-prompt, particularly for DouBao (accuracy: 57.5% vs. 7.5%, p < 0.01). Readability metrics (readability age, difficulty score, total word count, sentence length) showed no inter-model differences, but prompts reduced complexity (e.g., DouBao’s readability age decreased from 12.9 ± 0.8 to 10.1 ± 1.2 years, p < 0.01). Conclusion Publicly available LLMs provide largely accurate responses to cardio-oncology questions, yet their utility is constrained by inconsistent comprehensiveness and sensitivity to prompt design. While simplifying language improves readability, it risks compromising clinical relevance. Tailored fine-tuning and specialized evaluation frameworks are essential to optimize LLMs for patient education in cardio-oncology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.873 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.