Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Generative artificial intelligence-driven chatbots and medical misinformation: an accuracy, referencing and readability audit
1
Zitationen
7
Autoren
2026
Jahr
Abstract
OBJECTIVES: Artificial intelligence (AI)-driven chatbots have been rapidly adopted across research, education, business, marketing and medicine. Most interactions, however, come from non-experts using chatbots like search engines, including for everyday health and medical queries. DESIGN: We conducted an original study to audit chatbot responses in health and medical fields prone to misinformation. METHODS: highly problematic' using a coding matrix based on objective, predefined criteria. Citations were scored for accuracy and completeness, and each response was given a Flesch Reading Ease score. RESULTS: Nearly half (49.6%) of responses were problematic: 30% somewhat problematic and 19.6% highly problematic. Response quality did not differ significantly among chatbots (p=0.566) but Grok generated significantly more highly problematic responses than would be expected under a random distribution (z-score +2.07, p=0.038). Performance was strongest in vaccines (mean z-score -2.57) and cancer (-2.12), and weakest in stem cells (+1.25), athletic performance (+3.74) and nutrition (+4.35). Chatbot outputs were consistently expressed with confidence and certainty; from 250 total questions, there were only two refusals to answer (0.8%), both from Meta AI. Reference quality was poor, with a median completeness score of 40% (Q1-Q3: 20-67%). Chatbot hallucinations and fabricated citations precluded any chatbot from producing a fully accurate reference list. All readability scores were graded as 'Difficult' (30-50), equivalent to college sophomore-senior level. CONCLUSIONS: The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields. Continued deployment without public education and oversight risks amplifying misinformation.
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.633 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.587 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.551 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.459 Zit.