Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking clinical knowledge and multi-modal reasoning of large language models in liver cirrhosis

2026·0 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

As large language models (LLMs) become increasingly integral to healthcare, patients are frequently turning to them in the long-term management of chronic conditions like liver cirrhosis. However, the lack of standardized benchmarks complicates the selection of the most suitable models for specific clinical tasks. To address this gap, we constructed a question bank comprising 462 multiple-choice, 25 short-answer, and 40 multi-modal case questions to evaluate six mainstream LLMs in terms of clinical knowledge and multi-modal reasoning capability related to liver cirrhosis. Gemini-2.5pro dominated structured knowledge tasks with 88.5% accuracy in multiple-choice questions, while Grok-4 excelled in multi-modal reasoning, achieving 86.7% accuracy in case questions and outperforming Gemini-2.5pro across all dimensions in specialist-evaluated short-answer responses. Task-specific analysis further revealed complementary strengths, with GPT-5 excelling in diagnosis and DeepSeek-R1 in test interpretation. These findings highlight the distinct advantages of different models, underscoring the potential of LLMs as valuable auxiliary tools for medical education and decision support in cirrhosis management.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMultimodal Machine Learning Applications

Volltext beim Verlag öffnen

Benchmarking clinical knowledge and multi-modal reasoning of large language models in liver cirrhosis

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen