OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 18.05.2026, 14:24

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing generative artificial intelligence platforms and nursing student performance on a women’s health nursing examination in Korea: a Rasch model approach

2025·1 Zitationen·Journal of Educational Evaluation for Health ProfessionsOpen Access
Volltext beim Verlag öffnen

1

Zitationen

3

Autoren

2025

Jahr

Abstract

PURPOSE: This psychometric study aimed to compare the ability parameter estimates of generative artificial intelligence (AI) platforms with those of nursing students on a 50-item women's health nursing examination at Hallym University, Korea, using the Rasch model. It also sought to estimate item difficulty parameters and evaluate AI performance across varying difficulty levels. METHODS: The exam, consisting of 39 multiple-choice items and 11 true/false items, was administered to 111 fourth-year nursing students in June 2023. In December 2024, 6 generative AI platforms (GPT-4o, ChatGPT Free, Claude.ai, Clova X, Mistral.ai, Google Gemini) completed the same items. The responses were analyzed using the Rasch model to estimate the ability and difficulty parameters. Unidimensionality was verified by the Dimensionality Evaluation to Enumerate Contributing Traits (DETECT), and analyses were conducted using the R packages irtQ and TAM. RESULTS: The items satisfied unidimensionality (DETECT=-0.16). Item difficulty parameter estimates ranged from -3.87 to 1.96 logits (mean=-0.61), with a mean difficulty index of 0.79. Examinees' ability parameter estimates ranged from -0.71 to 3.14 logits (mean=1.17). GPT-4o, ChatGPT Free, and Claude.ai outperformed the median student ability (1.09 logits), scoring 2.68, 2.34, and 2.34, respectively, while Clova X, Mistral.ai, and Google Gemini exhibited lower scores (0.20, -0.12, 0.80). The test information curve peaked below θ=0, indicating suitability for examinees with low to average ability. CONCLUSION: Advanced generative AI platforms approximated the performance of high-performing students, but outcomes varied. The Rasch model effectively evaluated AI competency, supporting its potential utility for future AI performance assessments in nursing education.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSimulation-Based Education in HealthcareHealth Education and Validation
Volltext beim Verlag öffnen