OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.05.2026, 00:08

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Observer‐Performance Comparison of <scp>ChatGPT</scp> ‐5 and Gemini 2.5 Pro Versus Veterinarians in Canine and Feline Fundus Interpretation: A Multi‐Reader, Multi‐Case Study

2026·0 Zitationen·Veterinary OphthalmologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

OBJECTIVE: To compare two large language models (ChatGPT-5, Gemini 2.5 Pro) with experienced and novice veterinarians on canine and feline fundus cases, and to assess the relationship between perceived case difficulty and diagnostic performance. ANIMALS STUDIED: Forty-three client-owned cases were sampled from 200 ophthalmology records. PROCEDURE(S): Each case included signalment, history, and fundus photographs. Two experienced veterinarians, two novice veterinarians, and two LLMs independently selected findings and provided diagnosis from options. Participants rated difficulty (Very Easy-Hard). Group differences were tested with Kruskal-Wallis and Dunn-Bonferroni procedures; associations with difficulty used Spearman's ρ; paired proportions used Cochran's Q with Holm-adjusted McNemar tests. RESULTS: Experts achieved the highest accuracies (findings: 73.3% and 61.6%; diagnosis: 86.0% and 66.3%), significantly outperforming LLMs and novices (all adjusted p < 0.05). LLM finding accuracies were 52.0% (ChatGPT-5) and 49.3% (Gemini 2.5 Pro), both above novices (28.3% and 26.9%). LLM diagnosis accuracies were lower (ChatGPT-5: 37.2%, Gemini 2.5 Pro: 37.2%) but still numerically higher than novices (23.1% and 22.5%). Expert accuracy declined with increasing case difficulty, whereas LLM performance was comparatively stable (ChatGPT-5 range 2.37-3.86; Gemini 2.5 Pro 2.00-2.95). Difficulty correlated negatively with Expert 2 totals (ρ = -0.70, p < 0.0001) but not with LLMs (|ρ| ≤ 0.17, p ≥ 0.28). CONCLUSIONS: Experienced veterinarians are most accurate in fundus interpretation, but their performance declines with increasing difficulty. LLMs, though less accurate, remain stable across cases and outperform novices, indicating value as training or decision-support tools. Future studies should assess whether expert-LLM collaboration enhances accuracy and efficiency.

Ähnliche Arbeiten