Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Meta-analysis of large language models: benchmarking DeepSeek-R1 against ChatGPT, Gemini, Qwen, and LLaMA

2025·1 Zitationen·Journal Of Big DataOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The rapid evolution of large language models (LLMs), GPT-4 Turbo, Google Gemini, Qwen, Meta’s LLaMA 3.1, and DeepSeek-R1 has redefined the landscape of artificial intelligence. In the study, we conduct a hybrid meta-analysis integrating publicly available benchmarks, model cards, technical reports, and open-source repositories to evaluate LLMs across both performance and operational dimensions. Quantitative data were aggregated from standardized tasks such as MMLU (reasoning), HumanEval (code generation), FLORES-200 (translation), and TyDiQA (multilingual Q&A), complemented by efficiency metrics including FLOPs, GPU hours, inference latency, and subscription costs. A big data–driven KPI framework covering scalability index, data-throughput rate, energy per token, and training cost efficiency was applied to enable normalized, cross-model comparison. Results indicate that DeepSeek-R1 demonstrates strong coding and multilingual efficiency, ChatGPT-4 Turbo leads in reasoning accuracy, Gemini Ultra excels in multimodal inference, Qwen is competitive in Chinese-language tasks, and LLaMA 3.1 remains the most adaptable open-source option. Across datasets, DeepSeek-R1 achieved 80.2 ± 1.5% on HumanEval and 78.5 ± 1.8% on MMLU, compared with ChatGPT-4 Turbo’s 86.5 ± 1.9%; these gaps fall within observed heterogeneity (I2 = 14.6%). The findings highlight trade-offs among accuracy, scalability, and cost efficiency, emphasizing the need for transparent, sustainable, and multimodal LLM development.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingNatural Language Processing Techniques

Volltext beim Verlag öffnen

Meta-analysis of large language models: benchmarking DeepSeek-R1 against ChatGPT, Gemini, Qwen, and LLaMA

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen