Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Analysis of Medical-Domain and General-Purpose Large Language Models: Evaluating Specialization in Healthcare Applications

2026·0 Zitationen·Journal of PolytechnicOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

The growing use of Large Language Models (LLMs) in healthcare raises important questions about the need for domain-specific training in medical applications. This study presents a detailed evaluation of medical-domain and general-purpose LLMs using three medical datasets (PubMedQA, BioASQ, and WikiDoc), which contain approximately 11,000 question-answer pairs. We evaluated four medical-domain models (Meditron-7B, BioMistral-7B, MedAlpaca-13B, and PMC-LLaMA-13B) against four general-purpose instruction-tuned models (Ministral-8B-Instruct, Gemma 2-9B-it, Vicuna-13B v1.5, and Llama 3-8B-Instruct). Across 182,944 prompts in both zero-shot and few-shot settings, our findings show that general-purpose models consistently outperformed their medical-specific counterparts on all evaluation metrics. Specifically, Ministral-8B-Instruct achieved the highest performance in few-shot settings with a BERTScore of 0.613, SimCSE of 0.764, and semantic similarity of 0.684. These scores were significantly higher than those of the best medical model, BioMistral-7B (0.545, 0.678, and 0.533, respectively). Furthermore, zero-shot performance often matched or surpassed few-shot results, as seen with Llama-3-8B-Instruct achieving a SimCSE score of 0.794. These findings challenge the common assumption that domain-specific pretraining is required for optimal performance in specialized tasks and have major implications for how resources are allocated in healthcare AI development.

Autoren

Institutionen

TED University(TR)

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMachine Learning in Healthcare

Volltext beim Verlag öffnen

Comparative Analysis of Medical-Domain and General-Purpose Large Language Models: Evaluating Specialization in Healthcare Applications

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen