Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating large language model`s performance in answering principles of health course questions

2026·0 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Introduction The literature highlights the considerable potential of Artificial Intelligence (AI), particularly large language models (LLMs), in advancing health promotion among individuals. This study aimed to evaluate the performance of several LLMs in responding to questions from the Principles of Health course. This cross-sectional study was conducted in 2025. The LLMs evaluated included ChatGPT-4o, Gemini 2.5, Copilot 2025, and Perplexity 2.250619.0. These LLMs were utilized to respond to the study questionnaire pertaining to the Principles of Health course. To analyze and compare the performance of the LLMs in answering the research questions, a confusion matrix was constructed. Accordingly, four key metrics were calculated: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), in addition to overall accuracy. The LLMs included in the study demonstrated perfect sensitivity, each achieving a value of 1. Regarding specificity, ChatGPT and Perplexity attained the highest scores of 0.8, while Gemini and Copilot exhibited comparatively lower specificity values of 0.66 and 0.6, respectively. Furthermore, ChatGPT and Perplexity recorded the highest accuracy rates of 0.93, surpassing Gemini and Copilot, both of which achieved an accuracy of 0.86. The findings provided a detailed assessment of the performance of the LLMs. Results indicated that the performance of LLMs generally declined as the complexity, length, and verbosity of questionnaire items increased. Additionally, certain LLMs, such as Copilot, demonstrated particular difficulty when responding to quantitative questions involving numerical data. Further research is recommended to investigate these observations more comprehensively.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Clinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Evaluating large language model`s performance in answering principles of health course questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen