Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI-Assisted Corpus Linguistics: Integrating NLP Models Into Corpus Analysis
0
Zitationen
1
Autoren
2026
Jahr
Abstract
Integrating natural language processing (NLP) and artificial intelligence (AI) models into corpus linguistics has opened new avenues for linguistic analysis, yet their suitability for rigorous academic research remains debated due to issues like opacity and interpretability. This systematic review explores how NLP models transform traditional corpus linguistics methodologies, focusing on their applications, benefits, and challenges. Employing a PRISMA-guided approach, the study reviewed peer-reviewed literature from 2013 to 2025 across databases like Scopus and ACL Anthology, using keywords such as “AI in corpus linguistics” and “NLP corpus analysis”. Inclusion criteria targeted studies applying NLP models (e.g., BERT, GPT) to linguistic tasks, resulting in 12 selected studies after screening 922 records. A quality assessment using the CASP checklist ensured robustness, followed by thematic synthesis of findings. Results highlight that NLP models enhance corpus analysis by automating tasks like keyword extraction and pragmatic annotation, while offering scalability and semantic depth. Applications span discourse analysis, diachronic studies, and sociolinguistic variation, supported by tools like CorpusChat and Hugging Face Transformers. However, challenges include model biases, lack of transparency, and domain mismatch. The study explores that AI-driven NLP models significantly advance corpus linguistics but require addressing ethical, privacy, and reproducibility concerns to ensure academic rigor. Future research should focus on developing domain-specific models and enhancing interpretability to fully harness AI’s potential in linguistic studies.
Ähnliche Arbeiten
2019 · 31.549 Zit.
Techniques to Identify Themes
2003 · 5.374 Zit.
Answering the Call for a Standard Reliability Measure for Coding Data
2007 · 4.063 Zit.
Basic Content Analysis
1990 · 4.045 Zit.
Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
2013 · 3.049 Zit.