Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparison of the performance of ChatGPT-5, Gemini 3, Copilot, Perplexity, and medical students in answering neurology questions: a cross-sectional study

2026·0 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language model (LLM)-based chatbots have been utilized across various healthcare domains and have garnered substantial attention. This study aimed to evaluate and compare the performance of several LLM-based chatbots with that of medical students in responding to neurology questions. This cross-sectional study, conducted in December 2025 in Iran. ChatGPT-5, Gemini 3, Copilot 2025, Perplexity, and 20 medical students responded to a neurology questionnaire. A confusion matrix was utilized to analyze the data. In this regard, four metrics—sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV)—as well as overall accuracy were calculated. Moreover, correlations examined chatbot performance against question characteristics (word count, context, format, type, modality, language). The study revealed that overall performance metrics for the evaluated chatbots significantly outperformed those of medical students (p < 0.001). Among the evaluated chatbots, Copilot exhibited superior performance (0.88), followed by ChatGPT-5 (0.86), in terms of accuracy. Meanwhile, quantitative question types were associated with a significant reduction in chatbot performance (r = 0.470, p = 0.001). The study findings presented valuable insights results particularly pertinent to neurology, where chatbots can serve as supplementary tools for practitioners, enhancing diagnostic accuracy and clinical decision-making while adhering to established ethical standards. However, further research is required to provide more precise insights, particularly with a larger sample size of human participants.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRadiology practices and education

Volltext beim Verlag öffnen

Comparison of the performance of ChatGPT-5, Gemini 3, Copilot, Perplexity, and medical students in answering neurology questions: a cross-sectional study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen