Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of the usability of ChatGPT-4 Pro and Gemini 2.5 Pro in patient education about brain tumors
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Aim: The aim of this study is to determine the reliability of ChatGPT-4 Pro and Gemini 2.5 Pro chatbots through a systematic evaluation of the responses provided by neurosurgical specialists to patients' inquiries regarding brain tumors. Methods: The present study was conducted using artificial intelligence programs, and there is no relationship between the authors and the AI companies and sites associated with the study. The final tally revealed that a total of 56 frequently asked questions were identified. The present study will examine the ChatGPT-4 Pro and Gemini 2.5 Pro. The responses furnished by both artificial intelligence models were produced in Turkish and subsequently assessed by two independent neurosurgeons. The evaluation of the responses was conducted by two independent evaluators, who assigned scores without the knowledge of each other's evaluations. In the event that two neurosurgeons assigned the same score to a given response, it was accepted as final. In instances of discordance, a collaborative discussion was initiated, culminating in the determination and documentation of a consensus score. Results: The distribution of these questions is illustrated in Table 2. The mean GPT score for anatomy questions was 4.25 ± 0.88, while the mean GEMINI score was 4.50 ± 0.53 (p = 0.282). For inquiries pertaining to general questions, the mean GPT score was 4.43 ± 0.81, while the mean GEMINI score was 4.38 ± 0.81 (p = 0.500). Inquiries pertaining to prognostication and daily living activities revealed a mean GPT score of 5.00 ± 0.00, accompanied by a mean GEMINI score of 4.57 ± 0.78 (p = 0.100). In the treatment questions, the mean GPT score was 4.63 ± 0.67, while the mean GEMINI score was 4.36 ± 0.67 (p = 0.138). Figure 1 presents a comparative analysis categorized by inquiry group. A comparison of the mean scores revealed that GPT and GEMINI exhibited similar performance, with mean scores of 4.54 and 4.44, respectively. Conclusion: The present study demonstrates that large language model (LLM) technologies, including ChatGPT-4 Pro and Gemini 2.5 Pro, exhibit considerable promise in the provision of information and guidance to patients. Furthermore, the investigation revealed that artificial intelligence models do not demonstrate a substantial degree of superiority in the education of patients regarding brain tumors.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.