Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparing Physician and Artificial Intelligence Chatbot Responses to Post-hysterectomy Questions Posted to a Public Social Media Forum
0
Zitationen
5
Autoren
2025
Jahr
Abstract
INTRODUCTION: Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence (AI) in the form of online search engines or chatbots such as ChatGPT or Perplexity. AI chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them due to concerns about accuracy. The online networking service for medical professionals, Doximity, has expanded its resources to include a HIPAA-compliant AI writing assistant, DoximityGPT, designed to reduce the administrative burden on clinicians. Health professionals learn using a “medical model,” which greatly differs from the “health belief model” that lay people learn through. This mismatch in learning perspectives likely contributes to a communication mismatch even during digital clinician-patient encounters, especially in patients with limited health literacy during the perioperative period when complications may arise. OBJECTIVE: To evaluate and compare the readability of responses generated by three AI chatbot assistants (DoximityGPT, Perplexity, and ChatGPT) and a MIGS-trained surgeon to postoperative patient queries. METHODS: Three AI chatbots and physician-generated responses to post-hysterectomy patient questions were sourced from a public online forum. The responses were evaluated for reading ease using Flesch-Kincaid scoring. The Flesch-Kincaid calculator incorporated elements such as sentence and word length to determine grade-level readability scores. Descriptive statistics were used to describe the readability of the responses. RESULTS: The average reading ease of the patient questions (10) was 76.24±9.11 at a 7th-grade reading level, ranging from 5th grade to 10th to 12th grade. Overall, the response reading level ranged from 7th grade to college graduate. The reading levels of responses by DoximityGPT were 8th and 9th grade (n=2, 20 %), 10th to 12th grade (n=2, 20%), and college (n=6, 60%). The reading levels of responses by Perplexity were 10th to 12th grade (n=1, 10%), college (n=5, 50%), and college graduate (n=3, 30%). The reading levels of physician responses were 7th grade (n=2, 20%), 8th and 9th grade (n=3, 30%), 10th to 12th grade (n=4, 40%), and college (n=1, 10%). The reading levels of responses by ChatGPT were 10th to 12th grade (n=2, 20%), college (n=4, 40%), and college graduate (n=4, 40%). The mean reading ease for DoximityGPT, Perplexity, Physician, and ChatGPT were 47.76±12.08 (college), 39.21±8.10 (college), 59.73±12.27 (10th to 12th grade), and 36.45±8.97 (college), respectively. The mean word count was 170.3±53.93, 226±42, 105.6±39.99, and 254.9±73.2, for DoximityGPT, Perplexity, Physician, and ChatGPT, respectively. CONCLUSIONS: All responses were assessed to be higher than the recommended reading level of 6th grade or below for patient education materials. While the accuracy of patient education material is crucial for patient safety, the readability of patient information is often significantly higher than the general patient population’s health literacy level. This results in poor comprehension of the information provided. This analysis serves as a reminder for surgeons to be mindful of this mismatch in readability and general health literacy when considering the integration of AI chatbot assistants into patient care.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.578 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.470 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.984 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.814 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.