Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome

2025·3 Zitationen·Epilepsia OpenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVE: Artificial intelligence chatbots have been a game changer in healthcare, providing immediate, round-the-clock assistance. However, their accuracy across specific medical domains remains under-evaluated. Dravet syndrome remains one of the most challenging epileptic encephalopathies, with new data continuously emerging in the literature. This study aims to evaluate and compare the performance of ChatGPT 3.5 and Perplexity in responding to questions about Dravet Syndrome. METHODS: We curated 96 questions about Dravet syndrome, 43 from healthcare professionals and 53 from caregivers. Two epileptologists independently graded the chatbots' responses, with a third senior epileptologist resolving any disagreements to reach a final consensus. Accuracy and completeness of correct answers were rated on predefined 3-point scales. Incorrect responses were prompted for self-correction and re-evaluated. Readability was assessed using Flesch reading ease and Flesch-Kincaid grade level. RESULTS: = 7.27, p = 0.026). The topic with the poorest performance was Dravet syndrome's treatment, particularly for healthcare professional questions. Both models exhibited exemplary completeness, with most responses rated as "complete" to "comprehensive" (ChatGPT 3.5: 73.4%, Perplexity: 75.7%). Substantial self-correction capabilities were observed: ChatGPT 3.5 improved 55.6% of responses and Perplexity 80%. The texts were generally very difficult to read, requiring an advanced reading level. However, Perplexity's responses were significantly more readable than ChatGPT 3.5's [Flesch reading ease: 29.0 (SD 13.9) vs. 24.1 (SD 15.0), p = 0.018]. SIGNIFICANCE: Our findings underscore the potential of AI chatbots in delivering accurate and complete responses to Dravet syndrome queries. However, they have limitations, particularly in complex areas like treatment. Continuous efforts to update information and improve readability are essential. PLAIN LANGUAGE SUMMARY: Artificial intelligence chatbots have the potential to improve access to medical information, including on conditions like Dravet syndrome, but the quality of this information is still unclear. In this study, ChatGPT 3.5 and Perplexity correctly answered most questions from healthcare professionals and caregivers, with ChatGPT 3.5 performing better for caregivers. Treatment-related questions had the most incorrect answers, particularly those from healthcare professionals. Both chatbots demonstrated the ability to correct previous incorrect responses, particularly Perplexity. Both chatbots produced text requiring advanced reading skills. Further improvements are needed to make the text easier to understand and address difficult medical topics.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEpilepsy research and treatmentEEG and Brain-Computer Interfaces

Volltext beim Verlag öffnen

Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen