OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 03.04.2026, 06:00

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Herald: Democratizing Compositional Reasoning for Visual Tasks without Any Training

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Premium large language models (LLMs) such as GPT-4 offer impressive multimodal performance, yet their paywall limits both accessibility and reproducibility. We ask whether a coalition of freely-accessible LLMs-each individually noisy and uncertain-can collectively rival or surpass their premium counterparts when treated as collaborative, on-the-fly programmers. We present Herald, a framework that (i) primes a diverse pool of zero-cost API LLMs with chain-of-thought cues, (ii) harvests their responses as human-readable Python fragments, control-flow branches, and live API calls, and (iii) employs a rank-and-fuse module to assemble the best fragments into a single executable script. The resulting program is executed by an Executor that produces the task output and a fully inspectable reasoning trace. Without any additional training, Herald tackles heterogeneous vision workloads-image editing, semantic tagging, and medical triage-and achieves state-of-the-art or better accuracy on both medical and non-medical benchmarks. By transforming latent model competence into legible artefacts, Herald enables a transparent interaction style that invites user scrutiny, iterative refinement, and accountable auditing. All code and reproducible workflows are released at https://github.com/tgy1221/Herald, offering an open, resourceefficient alternative to premium LLM services.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Multimodal Machine Learning ApplicationsArtificial Intelligence in Healthcare and EducationMachine Learning in Materials Science
Volltext beim Verlag öffnen