Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Herald: Democratizing Compositional Reasoning for Visual Tasks without Any Training
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Premium large language models (LLMs) such as GPT-4 offer impressive multimodal performance, yet their paywall limits both accessibility and reproducibility. We ask whether a coalition of freely-accessible LLMs-each individually noisy and uncertain-can collectively rival or surpass their premium counterparts when treated as collaborative, on-the-fly programmers. We present Herald, a framework that (i) primes a diverse pool of zero-cost API LLMs with chain-of-thought cues, (ii) harvests their responses as human-readable Python fragments, control-flow branches, and live API calls, and (iii) employs a rank-and-fuse module to assemble the best fragments into a single executable script. The resulting program is executed by an Executor that produces the task output and a fully inspectable reasoning trace. Without any additional training, Herald tackles heterogeneous vision workloads-image editing, semantic tagging, and medical triage-and achieves state-of-the-art or better accuracy on both medical and non-medical benchmarks. By transforming latent model competence into legible artefacts, Herald enables a transparent interaction style that invites user scrutiny, iterative refinement, and accountable auditing. All code and reproducible workflows are released at https://github.com/tgy1221/Herald, offering an open, resourceefficient alternative to premium LLM services.
Ähnliche Arbeiten
MizAR 60 for Mizar 50
2023 · 74.629 Zit.
ImageNet: A large-scale hierarchical image database
2009 · 60.679 Zit.
Microsoft COCO: Common Objects in Context
2014 · 41.307 Zit.
Fully convolutional networks for semantic segmentation
2015 · 36.406 Zit.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.535 Zit.