Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking clinical knowledge and multi-modal reasoning of large language models in liver cirrhosis
0
Zitationen
7
Autoren
2026
Jahr
Abstract
As large language models (LLMs) become increasingly integral to healthcare, patients are frequently turning to them in the long-term management of chronic conditions like liver cirrhosis. However, the lack of standardized benchmarks complicates the selection of the most suitable models for specific clinical tasks. To address this gap, we constructed a question bank comprising 462 multiple-choice, 25 short-answer, and 40 multi-modal case questions to evaluate six mainstream LLMs in terms of clinical knowledge and multi-modal reasoning capability related to liver cirrhosis. Gemini-2.5pro dominated structured knowledge tasks with 88.5% accuracy in multiple-choice questions, while Grok-4 excelled in multi-modal reasoning, achieving 86.7% accuracy in case questions and outperforming Gemini-2.5pro across all dimensions in specialist-evaluated short-answer responses. Task-specific analysis further revealed complementary strengths, with GPT-5 excelling in diagnosis and DeepSeek-R1 in test interpretation. These findings highlight the distinct advantages of different models, underscoring the potential of LLMs as valuable auxiliary tools for medical education and decision support in cirrhosis management.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.700 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.873 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.