OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.03.2026, 04:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports

2025·27 Zitationen·npj Digital MedicineOpen Access
Volltext beim Verlag öffnen

27

Zitationen

11

Autoren

2025

Jahr

Abstract

Recent advancements in large language models (LLMs) have created new ways to support radiological diagnostics. While both open-source and proprietary LLMs can address privacy concerns through local or cloud deployment, open-source models provide advantages in continuity of access, and potentially lower costs. This study evaluated the diagnostic performance of fifteen open-source LLMs and one closed-source LLM (GPT-4o) in 1,933 cases from the Eurorad library. LLMs provided differential diagnoses based on clinical history and imaging findings. Responses were considered correct if the true diagnosis appeared in the top three suggestions. Models were further tested on 60 non-public brain MRI cases from a tertiary hospital to assess generalizability. In both datasets, GPT-4o demonstrated superior performance, closely followed by Llama-3-70B, revealing how open-source LLMs are rapidly closing the gap to proprietary models. Our findings highlight the potential of open-source LLMs as decision support tools for radiological differential diagnosis in challenging, real-world cases.

Ähnliche Arbeiten