Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating large language models as clinical laboratory test recommenders in primary and emergency care: a crucial step in clinical decision making

2025·3 Zitationen·Clinical Chemistry and Laboratory Medicine (CCLM)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVES: Large language models (LLMs), such as OpenAI's GPT-4o, have demonstrated considerable promise in transforming clinical decision support systems. In this study, we focused on a single but crucial task of clinical decision-making: laboratory test ordering. METHODS: We evaluated the self-consistency and performance of GPT-4o as a laboratory test recommender for 15 simulated clinical cases of different complexities across primary and emergency care settings. Through two prompting strategies - zero-shot and chain-of-thought - the model's recommendations were evaluated against expert consensus-derived gold-standard laboratory test orders categorized into essential and conditional test orders. RESULTS: We found that GPT-4o exhibited high self-consistency across repeated prompts, surpassing the consistency observed among individual expert orders in the earliest round of consensus. Precision was moderate to high for both prompting strategies (68-82 %), although relatively lower recall (41-51 %) highlighted a risk of underutilization. A detailed analysis of false negatives (FNs) and false positives (FPs) could explain some gaps in recall and precision. Notably, variability in recommendations centered primarily on conditional tests, reflecting the broader diagnostic uncertainty that can arise in diverse clinical contexts. Our analysis revealed that neither prompting strategy, case complexity, nor clinical context significantly affected GPT-4o's performance. CONCLUSIONS: This work underscores the promise of LLMs in optimizing laboratory test ordering while identifying gaps for enhancing their alignment with clinical practice. Future research should focus on real-world implementation, integrating clinician feedback, and ensuring alignment with local test menus and guidelines to improve both performance and trust in LLM-driven clinical decision support.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationElectronic Health Records SystemsClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Evaluating large language models as clinical laboratory test recommenders in primary and emergency care: a crucial step in clinical decision making

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen