OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.03.2026, 23:31

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Evaluating AI-generated medical image segmentations for clinical acceptability poses a significant challenge, as traditional pixelagreement metrics often fail to capture true diagnostic utility. This paper introduces Hierarchical Clinical Reasoner (HCR), a novel framework that leverages Large Language Models (LLMs) as clinical guardrails for reliable, zeroshot quality assessment. HCR employs a structured, multistage prompting strategy that guides LLMs through a detailed reasoning process, encompassing knowledge recall, visual feature analysis, anatomical inference, and clinical synthesis, to evaluate segmentations. We evaluated HCR on a diverse dataset across six medical imaging tasks. Our results show that HCR, utilizing models like Gemini 2.5 Flash, achieved a classification accuracy of 78.12 %, performing comparably to, and in instances exceeding, dedicated vision models such as <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\operatorname{ResNet50}(72.92 \%$</tex> accuracy) that were specifically trained for this task. The HCR framework not only provides accurate quality classifications but also generates interpretable, step-by-step reasoning for its assessments. This work demonstrates the potential of LLMs, when appropriately guided, to serve as sophisticated evaluators, offering a pathway towards more trustworthy and clinically-aligned quality control for AI in medical imaging.

Ähnliche Arbeiten