Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Are AI chatbots ready for chikungunya public education? Evidence on validity, reliability, and readability
0
Zitationen
7
Autoren
2026
Jahr
Abstract
BACKGROUND: Chikungunya continues to expand geographically, driving demand for trustworthy, easy-to-read public guidance. Conversational AI systems are increasingly used for health information, yet their medical validity, reliability, and readability remain uneven. METHODS: We evaluated four widely used chatbots (ChatGPT, Claude, DeepSeek, Gemini) on two task sets: (1) validity on a 50-item single-answer MCQ dataset about Chikungunya; and (2) reliability and readability on 13 core public-education questions derived from Google Trends "topics" and clinician. Reliability was scored with DISCERN, EQIP, GQS, and JAMA benchmarks by clinician raters; readability used ARI, CL, FKGL, FRES, GFI, and SMOG. RESULTS: Across three independent runs, Deepseek-V3.2 achieved the highest MCQ accuracy (86.7%); other models ranged near 72-78% accuracy. In paired brand-line comparisons, newer iterations outperformed predecessors: ChatGPT-5 vs. ChatGPT-4o showed a 5.3-percentage-point gain, and Deepseek-V3.2 vs. Deepseek-R1 showed an 8.7-point gain in accuracy. Reliability (DISCERN, EQIP, GQS, and JAMA) differed significantly across models (p < 0.05). Pairwise tests favored Deepseek-V3.2 on most instruments, though absolute JAMA scores were low for all systems, indicating limited transparency signals. Readability (ARI, CL, FKGL, FRES, GFI, and SMOG) also differed significantly (p < 0.05). Deepseek-V3.2 produced the easiest text on average, Gemini the most complex, and no model met the sixth-grade benchmark. CONCLUSION: Under the benchmark conditions of this study, newer chatbot iterations showed higher factual performance, and DeepSeek-V3.2 achieved the strongest task-specific answer-quality metrics. However, these findings should not be interpreted as evidence of readiness for public-health deployment or as an endorsement of any platform. Transparency remained limited, readability exceeded recommended public-facing levels, and platform-level concerns (including privacy, data governance, content moderation, and sociopolitical acceptability) remain essential considerations. Chatbots should therefore be treated as supervised adjuncts to, not substitutes for, authoritative public-health communication.
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.633 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.594 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.551 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.537 Zit.