OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 24.05.2026, 11:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Development and comparative evaluation of knowledge graph–enhanced large language models for domain-specific question answering in nursing

2026·0 Zitationen·BMC NursingOpen Access
Volltext beim Verlag öffnen

0

Zitationen

13

Autoren

2026

Jahr

Abstract

In busy clinical settings, timely access to comprehensive and actionable guideline recommendations can be constrained, motivating interest in large language models (LLMs) as adjunct tools for scalable evidence access and education. This study aimed to evaluate the performance of LLMs in domain-specific question-answering tasks within the nursing field and assess the effectiveness of GraphRAG technology in optimizing LLMs. A knowledge graph was constructed from high-quality clinical practice guidelines for pressure injury management and integrated with two base models—Qwen-turbo-0715 and DeepSeek-V3.1—to develop optimized versions (Qwen-turbo-0715-GraphRAG and DeepSeek-V3.1-GraphRAG). Model performance was compared between 10 non-specialist nurses, 10 specialist nurses, and the LLMs using a self-developed 25-item questionnaire designed with expert input to assess knowledge of pressure injury management. Group differences were analyzed using the Kruskal–Wallis H test with Bonferroni correction, followed by post hoc pairwise analysis. Average response times were also recorded for each model. Significant differences were observed among non-specialist nurses, specialist nurses, and LLMs (H = 17.662, P-value < 0.001). On this structured, guideline-derived benchmark under the study conditions, the LLM groups obtained higher scores than the nurse groups. The Qwen-turbo-0715-GraphRAG achieved the highest mean score (98.4), followed by DeepSeek-V3.1-GraphRAG (87.2), Qwen-turbo-0715 (86.4), ChatGPT-5 (82.4), and DeepSeek-V3.1 (77.6). GraphRAG optimization was associated with higher benchmark scores, but at the cost of longer response times. Overall, knowledge graph-enhanced LLMs showed more guideline-aligned and source-grounded outputs on this benchmark. However, these improvements were observed within a standardized, guideline-based assessment framework and may not capture the full complexity of clinical expertise. The findings support further evaluation of knowledge graph–enhanced LLMs in larger, more practice-oriented nursing settings and highlight the importance of considering the relative strengths of different base models before clinical application.

Ähnliche Arbeiten