OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.05.2026, 01:46

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the proficiency of large language models on funduscopic disease knowledge

2025·0 Zitationen·International Journal of OphthalmologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2025

Jahr

Abstract

AIM: To assess the performance of five distinct large language models (LLMs; ChatGPT-3.5, ChatGPT-4, PaLM2, Claude 2, and SenseNova) in comparison to two human cohorts (a group of funduscopic disease experts and a group of ophthalmologists) on the specialized subject of funduscopic disease. METHODS: Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test. The performance of these entities was assessed by comparing their average scores, response stability, and answer confidence, thereby establishing a basis for evaluation. RESULTS: Among all the LLMs, ChatGPT-4 and PaLM2 exhibited the most substantial average correlation. Additionally, ChatGPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam. In comparison to human cohorts, ChatGPT-4 exhibited comparable performance to ophthalmologists, albeit falling short of the expertise demonstrated by funduscopic disease specialists. CONCLUSION: The study provides evidence of the exceptional performance of ChatGPT-4 in the domain of funduscopic disease. With continued enhancements, validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians.

Ähnliche Arbeiten