OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.05.2026, 05:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

RAG-Assisted Small Language Models for Domain-Level Reasoning

2025·2 Zitationen
Volltext beim Verlag öffnen

2

Zitationen

2

Autoren

2025

Jahr

Abstract

The increasing demand for generative AI solutions has made privacy, efficiency, and reliability crucial requirements, especially in sensitive industries where incorrect or unverified outputs can lead to major risks. While large-scale models such as GPT-4 deliver strong performance, their computational cost and dependence on cloud infrastructure often make them unsuitable for highly regulated or offline deployment scenarios. Small Language Models (SLMs) offer a practical alternative due to their lower hardware requirements and ability to be fully self-hosted, but the trade-off is reduced accuracy and weaker reasoning capabilities. In this work, we explore whether Retrieval-Augmented Generation (RAG) can meaningfully improve SLM performance and make these models more viable for real-world industrial adoption. We integrate RAG into two open-source models, LLaMa 3.1 8B and LLaMa 3.2 3B running locally via the Ollama inference engine. Their performance is evaluated using the MMLU Pro benchmark, targeting domains where hallucination and factual grounding present real challenges. To support retrieval, we constructed domain-specific knowledge bases through synthetic data generation using the grounded Gemini API and targeted legal web scraping, covering areas such as biology, virology, computer security, and common law. Retrieved passages are stored in a vector database and supplied as supporting context during inference, allowing the models to base their reasoning on verifiable information. The results show clear improvements in accuracy across several domains, demonstrating that RAG can help reduce hallucinations and enhance reliability without retraining. These findings suggest that with carefully curated data and minimal resource overhead, small models can attain performance closer to that of much larger systems while maintaining full data control and regulatory compliance. Overall, this research highlights a feasible and scalable path toward deploying domain-adapted AI locally in environments where privacy and resource constraints are critical.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingNatural Language Processing TechniquesArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen