Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative evaluation of six large language models in transfusion medicine: Addressing language and domain‐specific challenges
2
Zitationen
6
Autoren
2025
Jahr
Abstract
BACKGROUND AND OBJECTIVES: Large language models (LLMs) such as GPT-4 are increasingly utilized in clinical and educational settings; however, their validity in subspecialized domains like transfusion medicine remains insufficiently characterized. This study assessed the performance of six LLMs on transfusion-related questions from Korean national licensing examinations for medical doctors (MDs) and medical technologists (MTs). MATERIALS AND METHODS: A total of 23 MD and 67 MT questions (2020-2023) were extracted from publicly available sources. All items were originally written in Korean and subsequently translated into English to evaluate cross-linguistic performance. Each model received standardized multiple-choice prompts (five options), and correctness was determined by explicit answer selection. Accuracy was calculated as the proportion of correct responses, with 0.75 designated as the performance threshold. Chi-square tests were employed to analyse language-based differences. RESULTS: GPT-4 and GPT-4o consistently surpassed the 0.75 threshold across both languages and examination types. GPT-3.5 demonstrated reasonable accuracy in English but showed a marked decline in Korean, suggesting limitations in multilingual generalization. Gemini 1.5 outperformed Gemini 1, particularly in Korean, though both exhibited variability across technical subdomains. Clova X showed inconsistent results across settings. All models demonstrated limited performance in legal and ethical scenarios. CONCLUSION: GPT-4 and GPT-4o exhibited robust and reliable performance across a range of transfusion medicine topics. Nonetheless, inter-model and inter-language variability highlights the need for targeted fine-tuning, particularly in the context of local regulatory and ethical frameworks, to support safe and context-appropriate implementation in clinical practice.
Ähnliche Arbeiten
A Multicenter, Randomized, Controlled Clinical Trial of Transfusion Requirements in Critical Care
1999 · 5.207 Zit.
Effects of tranexamic acid on death, vascular occlusive events, and blood transfusion in trauma patients with significant haemorrhage (CRASH-2): a randomised, placebo-controlled trial
2010 · 3.243 Zit.
Transfusion of Plasma, Platelets, and Red Blood Cells in a 1:1:1 vs a 1:1:2 Ratio and Mortality in Patients With Severe Trauma
2015 · 2.441 Zit.
Treatment of 5 Critically Ill Patients With COVID-19 With Convalescent Plasma
2020 · 2.424 Zit.
A Multicenter, Randomized, Controlled Clinical Trial of Transfusion Requirements in Critical Care
1999 · 2.045 Zit.