Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro

2024·19 Zitationen·AJO InternationalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

• The AI-based LLMs, ChatGPT-4o and Gemini 1.5 Pro demonstrate robust performance in addressing ophthalmology-related MCQs, which are occasionally challenging for healthcare professionals. • They have the potential to contribute ophthalmological medical education, not only by selecting correct answers to MCQs but also by providing explanations. • While both AI platforms are beneficial, ChatGPT-4o is one step ahead. To compare the interpretation and response context of two commonly used artificial intelligence (AI)-based large language model (LLM) platforms to ophthalmology-related multiple choice questions (MCQs) in the Swedish proficiency test for medicine (“ kunskapsprov för läkare ”) exams. Observational study. The questions of a total of 29 exams held between 2016 and 2024 were reviewed. All ophthalmology-related questions were included in this study, and categorized into ophthalmology sections. Questions were asked to ChatGPT-4o and Gemini 1.5 Pro AI-based LLM chatbots in Swedish and English with specific commands. Secondly, all MCQs were asked again without feedback. As the final step, feedback was given for questions that were still answered incorrectly, and all questions were subsequently re-asked. A total of 134 ophthalmology-related questions out of 4876 MCQs were evaluated via both AI-based LLMs. The MCQ count in the 29 exams was 4.62 ± 2.21 (range: 0–8). After the final step, ChatGPT-4o achieved higher accuracy in Swedish (94 %) and English (95.5 %) compared to Gemini 1.5 Pro (both at 88.1 %) ( p = 0.13 , and p = 0.04 , respectively). Moreover, ChatGPT-4o provided more correct answers in the neuro-ophthalmology section ( n = 47) compared to Gemini 1.5 Pro across all three attempts in English ( p < 0.05 ). There was no statistically significant difference either in the inter-AI comparison of other ophthalmology sections or in the inter-lingual comparison within AIs. Both AI-based LLMs, and especially ChatGPT-4o, appear to perform well in ophthalmology-related MCQs. AI-based LLMs can contribute to ophthalmological medical education not only by selecting correct answers to MCQs but also by providing explanations.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMedical Imaging and AnalysisClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen