Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Predicting Emergent Tool Use in LLMs Before It Emerges: A Proxy Perspective
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Tool-use capabilities fundamentally transform large language models (LLMs) from passive language generators into active agents with real-world utility, drawing intense research focus. Yet, their emergent nature renders traditional scaling laws ineffective for early-stage prediction, obstructing principled model design and efficient training. In this work, we propose a proxy-task perspective that predicts tool-use capabilities by measuring early model performance on selected non-emergent proxy tasks. Our method quantifies two properties of each proxy task: alignment, which reflects how well it captures tool-use trajectories, and stability, which indicates how consistently it behaves across training conditions. These properties are used to weight predictive signals. Theoretically, we formalize how these weighted signals approximate emergent tool use through bounded extrapolation under relaxed assumptions. Empirically, we validate our approach across training checkpoints, model scales, and data setups. Results show that a carefully weighted ensemble of proxy tasks can accurately rank downstream tool-use ability long before it arises. Our findings provide new theoretical foundations and practical tools for efficient training and capability planning, and advance the understanding of how complex abilities arise in LLMs.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.