Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MP17-02 COMPARING UROLOGICAL EXPERTISE: CHATBOTS VERSUS UROLOGISTS
0
Zitationen
5
Autoren
2024
Jahr
Abstract
You have accessJournal of UrologyEducation Research I (MP17)1 May 2024MP17-02 COMPARING UROLOGICAL EXPERTISE: CHATBOTS VERSUS UROLOGISTS Marijn Westgeest, Philip C. Weijerman, Wouter M. van Balken, Evert L. Koldewijn, and Michael R. van Balken Marijn WestgeestMarijn Westgeest , Philip C. WeijermanPhilip C. Weijerman , Wouter M. van BalkenWouter M. van Balken , Evert L. KoldewijnEvert L. Koldewijn , and Michael R. van BalkenMichael R. van Balken View All Author Informationhttps://doi.org/10.1097/01.JU.0001008628.15460.84.02AboutPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookLinked InTwitterEmail Abstract INTRODUCTION AND OBJECTIVE: Artificial intelligence (AI) is technology that enables computers to perform tasks that typically require human intelligence. A new form of AI is the chatbot, designed for dialogues with users (such as physicians, students or patients), to provide translated or more comprehensible information. It is unclear whether this information is reliable and can be used for this purpose. To investigate this, we had the three largest chatbots answer urological queries. METHODS: Mandatory questions for becoming a urologist and guideline-based questions to maintain knowledge were presented to AI: 41 from the course Female Urology 2022 (FU), 282 from the national resident examination (RE, 2021-2023) and 213 from the Dutch Urological Association guidelines (DUAG 2020-2023). The examined chatbots included ChatGPT (version 3.5, 08-19-2023), Bard (Google, 08-25-2023), and Bing Chat (Microsoft, 08-25-2023). ChatGPT is trained on data until September 2021, Bing Chat integrates ChatGPT with real-time search and Bard uses a diverse dataset for creativity. We investigated: provided answers, language, subtopic, differences between guideline-based questions and others and the cutoff/score of respondents as requested by the RE and DUA. RESULTS: The average percentage of correct answers varied by examination [RE (54.6%), FU (60.1%), DUAG (63.4%)] and chatbot [Bard (52.2%), ChatGPT (60.1%), Bing (66.7%)]. AI performed approximately as well as the respondents of FU and DAUG but fell just short of the RE cutoff (67%). Subtopic and language appeared to have no influence: although questions formulated in Dutch scored slightly higher than English questions by ChatGPT (67.2% vs. 61.3%) and Bard (67.2% vs. 52.3%). Guideline questions were not better answered than fabricated test questions. Remarkably, guideline-based questions answered correctly by >75% of respondents were also more often answered correctly by chatbots. Conversely, questions answered correctly by <30% of respondents were also answered less accurately by AI. This could suggest that our findings currently reflect more on question phrasing rather than on knowledge. CONCLUSIONS: The urological knowledge of AI, tested through exam and guideline questions, closely aligns with that of residents/urologists. Different question formulations and further advancements in AI may alter this in the near future. Download PPT Source of Funding: None © 2024 by American Urological Association Education and Research, Inc.FiguresReferencesRelatedDetails Volume 211Issue 5SMay 2024Page: e290 Advertisement Copyright & Permissions© 2024 by American Urological Association Education and Research, Inc.Metrics Author Information Marijn Westgeest More articles by this author Philip C. Weijerman More articles by this author Wouter M. van Balken More articles by this author Evert L. Koldewijn More articles by this author Michael R. van Balken More articles by this author Expand All Advertisement PDF downloadLoading ...
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.