Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Errors in AI-Assisted Retrieval of Medical Literature: A Comparative Study

2026·0 Zitationen·arXiv (Cornell University)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models (LLMs) assisted literature retrieval may lead to erroneous references, but these errors have not been rigorously quantified. Therefore, we quantitatively assess errors in reference retrieval of widely used free-version LLM platforms and identify the factors associated with retrieval errors. We evaluated 2,000 references retrieved by 5 LLMs (Grok-2, ChatGPT GPT-4.1, Google Gemini Flash 2.5, Perplexity AI, and DeepSeek GPT-4) for 40 randomly-selected original articles (10 per journal) published Jan. 2024 to July 2025 from British Medical Journal (BMJ), Journal of the American Medical Association, and The New England Journal of Medicine (NEJM). Primary outcomes were a multimetric score ratio combining validity of digital object identifier, PubMed ID, Google-Scholar link, and relevance; and complete miss rate (proportion of references failing all applicable metrics). Multivariable regression was used to examine independent associations. LLM platforms completely failed to retrieve correct reference data 47.8% of the time. The average score ratio of the 5 LLM platforms was 0.29 (standard deviation, 0.35; range, 0-1.25), with a higher score ratio indicating a higher accuracy in retrieving relevant references and correct bibliographic data. The highest and lowest accuracies were achieved by Grok (0.57) and Genimi (0.11), respectively. Compared with BMJ, NEJM articles had lower score ratios and higher complete miss rates. Multivariable analysis shows LLM platforms and journals were independently associated with score ratios and complete miss rate, respectively. We show modest overall performance of LLMs and significant variability in retrieval accuracy across platforms and journals. LLM platforms and journals are associated with LLM's performance in retrieving medical literature. Bibliographic data should be carefully reviewed when using LLM-assisted literature retrieval.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationBiomedical Text Mining and OntologiesMeta-analysis and systematic reviews

Volltext beim Verlag öffnen

Errors in AI-Assisted Retrieval of Medical Literature: A Comparative Study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen