OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.05.2026, 21:25

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Multimodal Performance of GPT-4 in Complex Ophthalmology Cases

2025·12 Zitationen·Journal of Personalized MedicineOpen Access
Volltext beim Verlag öffnen

12

Zitationen

12

Autoren

2025

Jahr

Abstract

Objectives: The integration of multimodal capabilities into GPT-4 represents a transformative leap for artificial intelligence in ophthalmology, yet its utility in scenarios requiring advanced reasoning remains underexplored. This study evaluates GPT-4’s multimodal performance on open-ended diagnostic and next-step reasoning tasks in complex ophthalmology cases, comparing it against human expertise. Methods: GPT-4 was assessed across three study arms: (1) text-based case details with figure descriptions, (2) cases with text and accompanying ophthalmic figures, and (3) cases with figures only (no figure descriptions). We compared GPT-4’s diagnostic and next-step accuracy across arms and benchmarked its performance against three board-certified ophthalmologists. Results: GPT-4 achieved 38.4% (95% CI [33.9%, 43.1%]) diagnostic accuracy and 57.8% (95% CI [52.8%, 62.2%]) next-step accuracy when prompted with figures without descriptions. Diagnostic accuracy declined significantly compared to text-only prompts (p = 0.007), though the next-step performance was similar (p = 0.140). Adding figure descriptions restored diagnostic accuracy (49.3%) to near parity with text-only prompts (p = 0.684). Using figures without descriptions, GPT-4’s diagnostic accuracy was comparable to two ophthalmologists (p = 0.30, p = 0.41) but fell short of the highest-performing ophthalmologist (p = 0.0004). For next-step accuracy, GPT-4 was similar to one ophthalmologist (p = 0.22) but underperformed relative to the other two (p = 0.0015, p = 0.0017). Conclusions: GPT-4’s diagnostic performance diminishes when relying solely on ophthalmic images without textual context, highlighting limitations in its current multimodal capabilities. Despite this, GPT-4 demonstrated comparable performance to at least one ophthalmologist on both diagnostic and next-step reasoning tasks, emphasizing its potential as an assistive tool. Future research should refine multimodal prompts and explore iterative or sequential prompting strategies to optimize AI-driven interpretation of complex ophthalmic datasets.

Ähnliche Arbeiten