Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating large language model`s performance in answering principles of health course questions
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Introduction The literature highlights the considerable potential of Artificial Intelligence (AI), particularly large language models (LLMs), in advancing health promotion among individuals. This study aimed to evaluate the performance of several LLMs in responding to questions from the Principles of Health course. This cross-sectional study was conducted in 2025. The LLMs evaluated included ChatGPT-4o, Gemini 2.5, Copilot 2025, and Perplexity 2.250619.0. These LLMs were utilized to respond to the study questionnaire pertaining to the Principles of Health course. To analyze and compare the performance of the LLMs in answering the research questions, a confusion matrix was constructed. Accordingly, four key metrics were calculated: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), in addition to overall accuracy. The LLMs included in the study demonstrated perfect sensitivity, each achieving a value of 1. Regarding specificity, ChatGPT and Perplexity attained the highest scores of 0.8, while Gemini and Copilot exhibited comparatively lower specificity values of 0.66 and 0.6, respectively. Furthermore, ChatGPT and Perplexity recorded the highest accuracy rates of 0.93, surpassing Gemini and Copilot, both of which achieved an accuracy of 0.86. The findings provided a detailed assessment of the performance of the LLMs. Results indicated that the performance of LLMs generally declined as the complexity, length, and verbosity of questionnaire items increased. Additionally, certain LLMs, such as Copilot, demonstrated particular difficulty when responding to quantitative questions involving numerical data. Further research is recommended to investigate these observations more comprehensively.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.