OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.05.2026, 02:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing the Accuracy of Four Artificial Intelligence Models in PubMed Citation Generation for Glaucoma Research

2026·0 Zitationen·Journal of Glaucoma
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2026

Jahr

Abstract

PRÉCIS: DeepSeek, a biomedically enriched AI model, achieved the highest accuracy in generating PubMed citations for glaucoma research, outperforming general-purpose models and highlighting the necessity of human oversight to mitigate AI-related citation errors. PURPOSE: This study evaluated the accuracy and reliability of four artificial intelligence (AI) models-ChatGPT (OpenAI GPT-3.5), Copilot (GitHub/Microsoft), DeepSeek (DeepSeek AI), and Gemini (Google AI)-in generating PubMed citations for glaucoma research. This study aimed to assess the potential of AI tools for academic reference generation and identify their limitations, particularly in specialized ophthalmology fields. METHODS: Thirty-five standardized clinical paragraphs from The Review of Ophthalmology (4th edition) were used to test citation accuracy. Each model was instructed to generate AMA 11-style PubMed citations. Citations were evaluated for accuracy, DOI matching, and clinical relevance. An expert review validated the outputs and classified them as "Fully Cited," "Partially Cited," or "Not Cited." RESULTS: DeepSeek, a biomedically enriched model, outperformed the others, with an accuracy of 92.0%. Copilot and Gemini achieved moderate accuracies of 66.7% and 25.8%, respectively, while ChatGPT achieved the lowest citation accuracy at 19.4%. Frequent errors included DOI mismatches, incorrect journal names, and irrelevant references. Expert review confirmed that even the best model produced citation errors, emphasizing the need for human oversight. We interpret this apparent advantage cautiously, as model details, updates, and changes in underlying data may influence performance. CONCLUSION: AI models-particularly biomedically enriched tools such as DeepSeek-can accelerate citation drafting, but citation hallucinations and metadata errors remain common. AI should serve as a decision support tool for reference retrieval and formatting, not a substitute for rigorous manual verification before submission.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationBiomedical Text Mining and OntologiesMeta-analysis and systematic reviews
Volltext beim Verlag öffnen