Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Are AI chatbots ready for chikungunya public education? Evidence on validity, reliability, and readability

2026·0 Zitationen·BMC Public HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Chikungunya continues to expand geographically, driving demand for trustworthy, easy-to-read public guidance. Conversational AI systems are increasingly used for health information, yet their medical validity, reliability, and readability remain uneven. METHODS: We evaluated four widely used chatbots (ChatGPT, Claude, DeepSeek, Gemini) on two task sets: (1) validity on a 50-item single-answer MCQ dataset about Chikungunya; and (2) reliability and readability on 13 core public-education questions derived from Google Trends "topics" and clinician. Reliability was scored with DISCERN, EQIP, GQS, and JAMA benchmarks by clinician raters; readability used ARI, CL, FKGL, FRES, GFI, and SMOG. RESULTS: Across three independent runs, Deepseek-V3.2 achieved the highest MCQ accuracy (86.7%); other models ranged near 72-78% accuracy. In paired brand-line comparisons, newer iterations outperformed predecessors: ChatGPT-5 vs. ChatGPT-4o showed a 5.3-percentage-point gain, and Deepseek-V3.2 vs. Deepseek-R1 showed an 8.7-point gain in accuracy. Reliability (DISCERN, EQIP, GQS, and JAMA) differed significantly across models (p < 0.05). Pairwise tests favored Deepseek-V3.2 on most instruments, though absolute JAMA scores were low for all systems, indicating limited transparency signals. Readability (ARI, CL, FKGL, FRES, GFI, and SMOG) also differed significantly (p < 0.05). Deepseek-V3.2 produced the easiest text on average, Gemini the most complex, and no model met the sixth-grade benchmark. CONCLUSION: Under the benchmark conditions of this study, newer chatbot iterations showed higher factual performance, and DeepSeek-V3.2 achieved the strongest task-specific answer-quality metrics. However, these findings should not be interpreted as evidence of readiness for public-health deployment or as an endorsement of any platform. Transparency remained limited, readability exceeded recommended public-facing levels, and platform-level concerns (including privacy, data governance, content moderation, and sociopolitical acceptability) remain essential considerations. Chatbots should therefore be treated as supervised adjuncts to, not substitutes for, authoritative public-health communication.

Autoren

Institutionen

Themen

AI in Service InteractionsArtificial Intelligence in Healthcare and EducationMisinformation and Its Impacts

Volltext beim Verlag öffnen

Are AI chatbots ready for chikungunya public education? Evidence on validity, reliability, and readability

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen