Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Meta-analysis of large language models: benchmarking DeepSeek-R1 against ChatGPT, Gemini, Qwen, and LLaMA
1
Zitationen
8
Autoren
2025
Jahr
Abstract
The rapid evolution of large language models (LLMs), GPT-4 Turbo, Google Gemini, Qwen, Meta’s LLaMA 3.1, and DeepSeek-R1 has redefined the landscape of artificial intelligence. In the study, we conduct a hybrid meta-analysis integrating publicly available benchmarks, model cards, technical reports, and open-source repositories to evaluate LLMs across both performance and operational dimensions. Quantitative data were aggregated from standardized tasks such as MMLU (reasoning), HumanEval (code generation), FLORES-200 (translation), and TyDiQA (multilingual Q&A), complemented by efficiency metrics including FLOPs, GPU hours, inference latency, and subscription costs. A big data–driven KPI framework covering scalability index, data-throughput rate, energy per token, and training cost efficiency was applied to enable normalized, cross-model comparison. Results indicate that DeepSeek-R1 demonstrates strong coding and multilingual efficiency, ChatGPT-4 Turbo leads in reasoning accuracy, Gemini Ultra excels in multimodal inference, Qwen is competitive in Chinese-language tasks, and LLaMA 3.1 remains the most adaptable open-source option. Across datasets, DeepSeek-R1 achieved 80.2 ± 1.5% on HumanEval and 78.5 ± 1.8% on MMLU, compared with ChatGPT-4 Turbo’s 86.5 ± 1.9%; these gaps fall within observed heterogeneity (I2 = 14.6%). The findings highlight trade-offs among accuracy, scalability, and cost efficiency, emphasizing the need for transparent, sustainable, and multimodal LLM development.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.