Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

From Prompts to Pipelines: Evaluating LLM-Generated Medical Image Segmentation Baselines

2026·0 Zitationen·The Journal of Machine Learning for Biomedical Imaging

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large Language Models (LLMs) are increasingly applied to automate complex tasks, but their potential for generating automated medical image segmentation pipelines remains largely underexplored. We present a systematic evaluation of open- and closed-source LLMs in generating U-Net–based segmentation frameworks across six diverse 2D medical image datasets, spanning endoscopy, fluoroscopy, dermoscopic photography, MRI, and fundus imaging. Building on an earlier study, we analyze state-of-the-art 2025 reasoning-enabled models and compare them to non-reasoning LLMs and a strong nnU-Net v2 baseline. Compared to their 2024 predecessors, the 2025 models demonstrated marked improvements in robustness, code quality, and segmentation accuracy across modalities. Our results show that reasoning-augmented LLMs achieve faster convergence, fewer execution errors, and higher Dice scores, while complex datasets with fine structures (e.g., retinal vessels) and volumetric data remain challenging. We also confirmed robustness under repeated runs by comparing one reasoning and one non-reasoning model from the same family, where despite GPT-4o’s consistent, template-like code outputs under multiple runs as the non-reasoning model, GPT-o4-mini-high showed significantly lower run-to-run variability in validation loss and tighter Dice score distributions, demonstrating that chain-of-thought reasoning markedly improves both accuracy and stability. These findings highlight the potential of reasoning-enabled LLMs to automate segmentation workflows with high accuracy and explainability, paving the way for their integration into medical imaging pipelines. Our code is available at <a href='https://github.com/ankilab/LLM_based_Segmentation.git'>https://github.com/ankilab/LLM_based_Segmentation.git<a>

Autoren

Institutionen

Friedrich-Alexander-Universität Erlangen-Nürnberg(DE)

Themen

Retinal Imaging and AnalysisArtificial Intelligence in Healthcare and EducationAdvanced Neural Network Applications

Volltext beim Verlag öffnen

From Prompts to Pipelines: Evaluating LLM-Generated Medical Image Segmentation Baselines

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen