Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of the performance of ChatGPT-5, Gemini 3, Copilot, Perplexity, and medical students in answering neurology questions: a cross-sectional study
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Large language model (LLM)-based chatbots have been utilized across various healthcare domains and have garnered substantial attention. This study aimed to evaluate and compare the performance of several LLM-based chatbots with that of medical students in responding to neurology questions. This cross-sectional study, conducted in December 2025 in Iran. ChatGPT-5, Gemini 3, Copilot 2025, Perplexity, and 20 medical students responded to a neurology questionnaire. A confusion matrix was utilized to analyze the data. In this regard, four metrics—sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV)—as well as overall accuracy were calculated. Moreover, correlations examined chatbot performance against question characteristics (word count, context, format, type, modality, language). The study revealed that overall performance metrics for the evaluated chatbots significantly outperformed those of medical students (p < 0.001). Among the evaluated chatbots, Copilot exhibited superior performance (0.88), followed by ChatGPT-5 (0.86), in terms of accuracy. Meanwhile, quantitative question types were associated with a significant reduction in chatbot performance (r = 0.470, p = 0.001). The study findings presented valuable insights results particularly pertinent to neurology, where chatbots can serve as supplementary tools for practitioners, enhancing diagnostic accuracy and clinical decision-making while adhering to established ethical standards. However, further research is required to provide more precise insights, particularly with a larger sample size of human participants.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.