Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Validity, reliability, and readability of Artificial Intelligence chatbots as public sources of information on hearing loss: a comparative evaluation of ChatGPT, Bing, Gemini, and Perplexity
2
Zitationen
6
Autoren
2025
Jahr
Abstract
OBJECTIVE: To assess the validity, reliability and readability of four AI chatbots for hearing-health information. DESIGN AND STUDY SAMPLE: Three audiologists created 100 questions covering adult hearing loss, paediatric hearing, hearing aids, tinnitus and cochlear implants (20 each). Questions were submitted twice to ChatGPT-3.5, Bing AI, Gemini and Perplexity. Answers were scored for factual accuracy and completeness on a five-point Global Quality Score. Validity was defined using low (score = 5) and high (score ≥ 4) thresholds. Internal consistency was estimated with Cronbach's α; readability with the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). All scoring was completed independently by two blinded reviewers; discrepancies were resolved by consensus. RESULTS: Under the low threshold ChatGPT-3.5 and Perplexity were most valid (84% and 79%); high-threshold validity fell to 37% and 34%. Perplexity had the highest overall reliability (α = 0.83) yet α dropped below 0.70 for cochlear-implant, tinnitus and hearing-aid questions. 84% percent of outputs were "Difficult"/"Very Difficult" and 68% read at college level. CONCLUSIONS: AI chatbots deliver generally accurate hearing-health content, but high-threshold accuracy, domain-specific reliability and readability remain suboptimal. They should supplement, not replace the professional counselling. Continued optimisation and external validation are needed before routine clinical recommendation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.