OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 05.04.2026, 05:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

GPTViet: An Open-Source Vietnamese Foundation Model from Pretraining to Domain Specialization

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

As open-source Large Language Models (LLMs) increasingly rival proprietary counterparts, the need for foundational models tailored to specific linguistic and cultural contexts becomes critical. This paper presents GPTViet, a series of foundational LLMs for the Vietnamese language. Built upon the LLaMA architecture, GPTViet was developed by curating a high-quality Vietnamese corpus and performing extensive finetuning on a range of model sizes (8 B to 70 B parameters). Evaluations demonstrate that GPTViet models significantly outperform their respective base models on Vietnamese-specific tasks, as measured by standard benchmarks and our custombuilt VietExam benchmark. The practical utility of this work is showcased through domain-specific application, VietHealth 70B for medical consultation. Adhering to the principles of open source Llama, GPTViet and all its derivatives are publicly released under an open-source license. This initiative provides the Vietnamese research and development community with a powerful, adaptable foundation to accelerate the creation of diverse intelligent applications. For more information, please visit https://github.com/VietnamAIHub/GPTViet and demo at http://gptviet.ioit.ac.vn/.

Ähnliche Arbeiten