Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
2
Zitationen
7
Autoren
2025
Jahr
Abstract
Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analyzed ChatGPT’s responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions. Results: ChatGPT, for MCQs, scored on average 64.39%. ChatGPT’s strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test p < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (p = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusions: ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT’s ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.