Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comprehensive Benchmark of Tool-Augmented Large Language Models for Biomedical Knowledge Retrieval and Integration
0
Zitationen
4
Autoren
2025
Jahr
Abstract
General-purpose large language models (LLMs) often struggle in specialized biomedical applications due to their limited access to up-to-date, structured knowledge and domainspecific tools. Although recent studies show proof of life for AI agents in increasingly complex scientific tasks, few studies offer insights at an atomic-level. We present the first large-scale evaluation of LLM tool-calling capabilities, focused on genomics annotation tasks such as variant-to-position and variant-to-gene mapping. Our study benchmarks over one hundred LLMs, via OpenRouter's metagateway, using a standardized tool-calling protocol to retrieve information directly from biomedical APIs (e.g., NCBI dbSNP, Entrez Gene). Our experimental results show that models equipped with structured tool access significantly outperform prompt-only baselines in accuracy, factual consistency, and verifiability. These findings demonstrate the necessity of tool augmentation for reliable biomedical reasoning and provide practical insights for building and testing LLM-based agents across diverse biomedical workflows.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.324 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.189 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.588 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.470 Zit.