Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models in Nursing Examinations: Comparative Analysis of <scp>ChatGPT</scp> ‐3.5, <scp>ChatGPT</scp> ‐4 and <scp>iFLYTEK</scp> Spark in China
1
Zitationen
4
Autoren
2025
Jahr
Abstract
BACKGROUND: While large language models (LLMs) have been widely utilised in nursing education, their performance in Chinese nursing examinations remains unexplored, particularly in the context of ChatGPT-3.5, ChatGPT-4 and iFLYTEK Spark. PURPOSE: This study assessed the performance of ChatGPT-3.5, ChatGPT-4 and iFLYTEK Spark on the 2022 China National Nursing Professional Qualification Exam (CNNPQE) at both the Junior and Intermediate levels. It also investigated whether the accuracy of these language models' responses correlated with the exam's difficulty or subject matter. METHODS: We inputted 800 questions from the 2022 CNNPQE-Junior and CNNPQE-Intermediate exams into ChatGPT-3.5, ChatGPT-4 and iFLYTEK Spark to determine their accuracy rates in correctly answering the questions. We then analysed the correlation between these accuracy rates and the exams' difficulty levels or subjects. RESULTS: = 97.435, df = 4, p < 0.001). CONCLUSIONS: ChatGPT-4 and iFLYTEK Spark performed well on Chinese nursing examinations and demonstrated potential as valuable tools in nursing education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.