Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

CamelEval: Advancing Benchmarks for Arabic Language Models in Generative Tasks

2025·0 Zitationen·IEEE AccessOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) serve as the foundation of contemporary artificial intelligence systems. Recently, a diverse range of Arabic-centric LLMs has emerged, and with them, a variety of evaluation suites, designed to assess the alignment of LLMs with the values and preferences of Arabic speakers and address their capabilities on instruction following, open-ended question answering, and information delivery. However, the majority of these suites rely exclusively on multiple-choice questions and, thereby, fail to adequately assess the text generation capabilities of LLMs. To address this shortcoming, we propose a new automated evaluation benchmark, CamelEval. CamelEval comprises three test suites to evaluate general instruction following, factuality, and cultural alignment. Each test suite contains 805 carefully curated challenging test cases that reflect the nuances of Arabic language and culture. We envision CamelEval as a tool to guide the development of future Arabic LLMs, serving over 400 million Arabic speakers by providing LLMs that not only communicate in their language but also understand their culture.

Autoren

Institutionen

Riyadh Elm University(SA)

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationText Readability and Simplification

Volltext beim Verlag öffnen

CamelEval: Advancing Benchmarks for Arabic Language Models in Generative Tasks

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen