Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions
4
Zitationen
2
Autoren
2025
Jahr
Abstract
INTRODUCTION: Large language models (LLMs) like Gemini 2.0 Advanced and ChatGPT-4o are increasingly applied in medical contexts. This study assesses their accuracy in answering cataract-related questions from Brazilian ophthalmology board exams, evaluating their potential for clinical decision support. METHODS: A retrospective analysis was conducted using 221 multiple-choice questions. Responses from both LLMs were evaluated by two independent ophthalmologists against the official answer key. Accuracy rates and inter-evaluator agreement (Cohen's kappa) were analyzed. RESULTS: Gemini 2.0 Advanced achieved 85.45% and 80.91% accuracy, while ChatGPT-4o scored 80.00% and 84.09%. Inter-evaluator agreement was moderate (κ = 0.514 and 0.431, respectively). Performance varied across exam years. CONCLUSION: Both models demonstrated high accuracy in cataract-related board exam questions, supporting their potential as educational tools. However, moderate agreement and performance variability indicate the need for further refinement and validation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.