Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of large language models in answering frequently‐asked questions on celiac disease
0
Zitationen
7
Autoren
2026
Jahr
Abstract
OBJECTIVES: Celiac disease (CeD) is a common autoimmune condition requiring lifelong adherence to a gluten-free diet (GFD). Patients and caregivers increasingly seek information online, and large language models (LLMs) have emerged as potential educational tools. However, their reliability in CeD remains uncertain. This study aimed to evaluate the performance of three popular LLMs in answering frequently asked questions (FAQs) about CeD and GFD management. METHODS: We conducted a cross-sectional comparative evaluation in which 12 FAQs were submitted to three LLMs: ChatGPT-4 (OpenAI), Gemini Flash 2.5 (Google), and Claude Sonnet 3.7 (Anthropic). Six pediatric gastroenterologists with expertise in CeD research and education, independently assessed and rated responses for accuracy, completeness, clarity, and overall quality using a 5-point Likert scale. RESULTS: The mean overall score across models was 4.3 ± 0.35 out of 5. Clarity received the highest ratings (4.56 ± 0.21), followed by accuracy (4.26 ± 0.52), completeness (4.17 ± 0.21), and overall quality (4.20 ± 0.36). Responses to management-related questions scored significantly higher than those to diagnostic questions (4.4 vs. 4.2, p = 0.013). Inter-rater reliability was good (intraclass correlation coefficient = 0.74). Overall, Gemini achieved the highest ratings (p < 0.01). CONCLUSIONS: LLMs provide clear and generally accurate responses to CeD FAQs, particularly on management-related topics. While they represent a promising tool for patient education, variability in accuracy highlights the need for clinician oversight when interpreting artificial intelligence-generated medical information.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.