Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions

2025·15 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The ability of large language models (LLMs) to accurately answer medical board-style questions reflects their potential to benefit medical education and real-time clinical decision-making. With the recent advance to reasoning models, the latest LLMs excel at addressing complex problems in benchmark math and science tests. This study assessed the performance of first-generation reasoning models-DeepSeek's R1 and R1-Lite, OpenAI's o1 Pro, and Grok 3-on 493 ophthalmology questions sourced from the StatPearls and EyeQuiz question banks. o1 Pro achieved the highest overall accuracy (83.4%), significantly outperforming DeepSeek R1 (72.5%), DeepSeek-R1-Lite (76.5%), and Grok 3 (69.2%) (p < 0.001 for all pairwise comparisons). o1 Pro also demonstrated superior performance in questions from eight of nine ophthalmologic subfields, questions of second and third order cognitive complexity, and on image-based questions. DeepSeek-R1-Lite performed the second best, despite relatively small memory requirements, while Grok 3 performed inferiorly overall. These findings demonstrate that the strong performance of the first-generation reasoning models extends beyond benchmark tests to high-complexity ophthalmology questions. While these findings suggest a potential role for reasoning models in medical education and clinical practice, further research is needed to understand their performance with real-world data, their integration into educational and clinical settings, and human-AI interactions.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen