Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews.
1
Zitationen
5
Autoren
2025
Jahr
Abstract
Objectives: Large language models (LLMs) may improve the process of conducting systematic literature reviews. Our aim was to evaluate the utility of one popular LLM chatbot, Chat Generative Pre-trained Transformer (ChatGPT), in systematic literature reviews when compared to traditionally conducted reviews. Methods: We identified five systematic reviews published in the Journal of Bone and Joint Surgery from 2021 to 2022. We retrieved the clinical questions, methodologies, and included studies for each review. We evaluated ChatGPT's performance on three tasks. (1) For each published systematic review's core clinical question, ChatGPT designed a relevant database search strategy. (2) ChatGPT screened the abstracts of those articles identified by that search strategy for inclusion in a review. (3) For one systematic review, ChatGPT reviewed each individual manuscript identified after screening to identify those that fit inclusion criteria. We compared the performance of ChatGPT on each of these three tasks to the previously published systematic reviews. Results: ChatGPT captured a median of 91% (interquartile range, IQR 84%, 94%) of articles in the published systematic reviews. After screening of these abstracts, ChatGPT was able to capture a median of 75% (IQR 70%, 79%) of articles included in the published systematic reviews. On in-depth screening of manuscripts, ChatGPT captured only 55% of target publications; however, this improved to 100% on review of the manuscripts that ChatGPT identified on this step. Qualitative analysis of ChatGPT's performance highlighted the importance of prompt design and engineering. Conclusion: Using published reviews as a gold standard, ChatGPT demonstrated ability in replicating fundamental tasks for orthopedic systematic review. Cautious use and supervision of this general purpose LLM, ChatGPT, may aid in the process of systematic literature review. Further study and discussion regarding the role of LLMs in literature review is needed.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.