Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the performance of ChatGPT in responding to questions about endoscopic procedures for patients
16
Zitationen
10
Autoren
2023
Jahr
Abstract
Background and Aims: We evaluated the precision, medical accuracy, superfluous content, and consistency of ChatGPT's responses to commonly asked questions about endoscopic procedures and its capability to provide emotional support, comparing its performance with the generative pretrained transformer 4 (GPT-4) model. Methods: A set of 113 questions related to EGD, colonoscopy, EUS, and ERCP was curated from professional societies and institutional web pages. Responses from ChatGPT were generated and subsequently graded by board-certified gastroenterologists and advanced endoscopists. The emotional support efficacy of ChatGPT and GPT-4 was also assessed by a board-certified psychiatrist (L.S.-M.). Results: ChatGPT exhibited moderate precision in answering questions about EGD (57.9% comprehensive), colonoscopy (47.6% comprehensive), EUS (48.1% comprehensive), and ERCP (44.4% comprehensive). Medical accuracy was highest for EGD (52.6% fully accurate) and lowest for EUS (40.7% fully accurate). Concerning superfluous content, responses were predominantly concise for EGD and colonoscopy, with ERCP and EUS showing increased extraneous content. Reproducibility scores varied across domains, ranging from 50.34% (for EUS) to 68.6% (for EGD). GPT-4 outperformed ChatGPT in emotional support, although both models exhibited satisfactory performance. Conclusions: ChatGPT delivers moderately precise and medically accurate answers related to common endoscopic procedures with varying levels of extraneous content. It holds promise as a supplementary information resource for both patients and healthcare professionals.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.778 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.690 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.259 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.901 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.