Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking proprietary and open-source language and vision-language models for gastroenterology clinical reasoning
1
Zitationen
18
Autoren
2025
Jahr
Abstract
This study evaluated the effectiveness of large language models (LLMs) and vision-language models (VLMs) in gastroenterology. We used board-style multiple-choice questions to assess the performance of both proprietary and open-source LLMs and VLMs-including GPT, Claude, Gemini, Mistral, Llama, Mixtral, Phi, and Qwen, across different interfaces, computing environments, and levels of compression (quantization). Among the proprietary models, o1-preview (82.0%) and Claude3.5-Sonnet (74.0%) had the highest accuracy, outperforming the top open-source models: Llama3.3-70b (65.7%) and Qwen-2.5-72b (61.0%). Among the small quantized open-source models, the 8-bit Llama 3.2-11b (51.7%) and 6-bit Phi3-14b (48.7%) performed the best, with scores comparable to their full-precision counterparts. Notably, VLM accuracy on image-containing questions improved (~10%) when given human-generated captions, remained unchanged with original images, and declined with LLM-generated captions. Further research is warranted to evaluate model capabilities in real-world clinical decision-making scenarios.
Ähnliche Arbeiten
MizAR 60 for Mizar 50
2023 · 74.522 Zit.
ImageNet: A large-scale hierarchical image database
2009 · 60.633 Zit.
Microsoft COCO: Common Objects in Context
2014 · 41.283 Zit.
Fully convolutional networks for semantic segmentation
2015 · 36.387 Zit.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.474 Zit.
Autoren
Institutionen
- Child Health and Development Institute(US)
- Icahn School of Medicine at Mount Sinai(US)
- The University of Texas Health Science Center at San Antonio(US)
- Virginia Hospital Center(US)
- Shahid Beheshti University of Medical Sciences(IR)
- Stanford University(US)
- Cedars-Sinai Medical Center(US)
- Inova Fairfax Hospital(US)
- University of California, Los Angeles(US)
- Texas Center for Infectious Disease(US)
- New York University(US)
- University of California System(US)
- Columbia University(US)