Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From Data to Deployment: A Comprehensive Analysis of Risks in Large Language Model Research and Development
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Large language models (LLMs) have evolved significantly, achieving unprecedented linguistic capabilities that underpin a wide range of AI applications. However, they also pose risks and challenges such as ethical concerns, bias and computational sustainability. How to balance the high performance in revolutionising information processing with the risks they pose is critical to their future development. LLM is a type of NLP model and many of the LLM risks are also risks that NLP has experienced in the past. We, therefore, summarise these risks, focusing more on the underlying understanding of these risks/technical tools, rather than simply describing their occurrence in LLM. In this paper, we first discuss and compare the current state of research on the four main risks in the process of developing LLMs: data, system, pretraining and inference, and then, try to summarise the rationale, complexity, prospects and challenges of the key issues and challenges in each phase. Finally, this review concludes with a discussion of the fundamental issues that should be of most concern and risk and that should be addressed in the early stages of modelling research, including the correlated issues of privacy preservation and countering attacks and model robustness. Based on the LLM research and development (R&D) process perspective, this review summarises the actual risks and provides guidance for research directions, with the aim of helping researchers to identify these risk points and technology directions worth investigating, as well as helping to establish a safe and efficient R&D process.
Ähnliche Arbeiten
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY
2002 · 8.401 Zit.
Calibrating Noise to Sensitivity in Private Data Analysis
2006 · 6.886 Zit.
Deep Learning with Differential Privacy
2016 · 5.612 Zit.
Communication-Efficient Learning of Deep Networks from Decentralized\n Data
2016 · 5.593 Zit.
Large-Scale Machine Learning with Stochastic Gradient Descent
2010 · 5.570 Zit.