Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative evaluation of six large language models in transfusion medicine: Addressing language and domain‐specific challenges

2025·2 Zitationen·Vox Sanguinis

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

BACKGROUND AND OBJECTIVES: Large language models (LLMs) such as GPT-4 are increasingly utilized in clinical and educational settings; however, their validity in subspecialized domains like transfusion medicine remains insufficiently characterized. This study assessed the performance of six LLMs on transfusion-related questions from Korean national licensing examinations for medical doctors (MDs) and medical technologists (MTs). MATERIALS AND METHODS: A total of 23 MD and 67 MT questions (2020-2023) were extracted from publicly available sources. All items were originally written in Korean and subsequently translated into English to evaluate cross-linguistic performance. Each model received standardized multiple-choice prompts (five options), and correctness was determined by explicit answer selection. Accuracy was calculated as the proportion of correct responses, with 0.75 designated as the performance threshold. Chi-square tests were employed to analyse language-based differences. RESULTS: GPT-4 and GPT-4o consistently surpassed the 0.75 threshold across both languages and examination types. GPT-3.5 demonstrated reasonable accuracy in English but showed a marked decline in Korean, suggesting limitations in multilingual generalization. Gemini 1.5 outperformed Gemini 1, particularly in Korean, though both exhibited variability across technical subdomains. Clova X showed inconsistent results across settings. All models demonstrated limited performance in legal and ethical scenarios. CONCLUSION: GPT-4 and GPT-4o exhibited robust and reliable performance across a range of transfusion medicine topics. Nonetheless, inter-model and inter-language variability highlights the need for targeted fine-tuning, particularly in the context of local regulatory and ethical frameworks, to support safe and context-appropriate implementation in clinical practice.

Autoren

Institutionen

Themen

Blood transfusion and managementArtificial Intelligence in Healthcare and EducationTrauma, Hemostasis, Coagulopathy, Resuscitation

Volltext beim Verlag öffnen

Comparative evaluation of six large language models in transfusion medicine: Addressing language and domain‐specific challenges

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen