Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Comprehensive Benchmark of Tool-Augmented Large Language Models for Biomedical Knowledge Retrieval and Integration

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

General-purpose large language models (LLMs) often struggle in specialized biomedical applications due to their limited access to up-to-date, structured knowledge and domainspecific tools. Although recent studies show proof of life for AI agents in increasingly complex scientific tasks, few studies offer insights at an atomic-level. We present the first large-scale evaluation of LLM tool-calling capabilities, focused on genomics annotation tasks such as variant-to-position and variant-to-gene mapping. Our study benchmarks over one hundred LLMs, via OpenRouter's metagateway, using a standardized tool-calling protocol to retrieve information directly from biomedical APIs (e.g., NCBI dbSNP, Entrez Gene). Our experimental results show that models equipped with structured tool access significantly outperform prompt-only baselines in accuracy, factual consistency, and verifiability. These findings demonstrate the necessity of tool augmentation for reliable biomedical reasoning and provide practical insights for building and testing LLM-based agents across diverse biomedical workflows.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationGenomics and Rare DiseasesTopic Modeling

Volltext beim Verlag öffnen

A Comprehensive Benchmark of Tool-Augmented Large Language Models for Biomedical Knowledge Retrieval and Integration

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen