Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Rise of the Machines: Comparing Performance of Artificial Intelligence Large Language Models on Pharmacy Specialty Certification Examination Practice Questions.

2026·0 Zitationen·PubMed

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Large language models (LLMs) are increasingly used for clinical information retrieval and decision support, yet comparative performance on pharmacy board examination-style content across specialties remains incompletely characterized. METHODS: We evaluated 15 LLMs using 145 publicly available Board of Pharmacy Specialties (BPS) certification practice questions spanning 14 specialty domains. Questions were entered using a standardized prompt without additional prompt engineering. Model responses were scored against BPS-posted answer keys. Overall and specialty-level accuracy were summarized descriptively. Differences among LLMs were tested using Cochran's Q with Bonferroni-adjusted McNemar pairwise comparisons when appropriate, and LLMs were assessed using their default user-facing settings. RESULTS: Across all LLMs, mean accuracy was 86.2% (standard deviation [SD], 3.5%), corresponding to an average of 125/145 items answered correctly. Accuracy ranged from 79.3% (95% confidence interval [CI], 72.6%-86%) for Perplexity AI to 91.7% (95% CI, 87.2%-96.3%) for Microsoft Copilot (GPT-5). Overall performance differed significantly across LLMs (Cochran's Q = 46.262; df = 14; p < 0.001). After Bonferroni adjustment, Microsoft Copilot (GPT-5), Google Gemini 2.5 Flash, and OpenAI o3 (Reasoning) outperformed Perplexity AI (p < 0.001). Microsoft Copilot (GPT-5) also outperformed an earlier version of Microsoft Copilot (GPT-4.1) (p < 0.001). Specialty-level heterogeneity was generally limited, with significant model differences observed in Solid Organ Transplantation Pharmacy and Nuclear Pharmacy. CONCLUSIONS: LLMs demonstrated high accuracy on BPS certification practice questions, with limited variability across LLMs and select specialty domains. These findings support continued evaluation of LLMs for potential use in pharmacy practice and clinical decision support, emphasizing the need for domain-specific validation and ongoing monitoring as LLMs evolve.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationElectronic Health Records SystemsTopic Modeling

Volltext beim Verlag öffnen

Rise of the Machines: Comparing Performance of Artificial Intelligence Large Language Models on Pharmacy Specialty Certification Examination Practice Questions.

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen