Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Generative AI Models for Code Generation Tasks Using Embedding-Based Semantic Similarity
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Generative artificial intelligence (AI) is rapidly transforming software development, especially in code generation. Large Language Models (LLMs) show strong potential for automating programming tasks, though their performance varies with task complexity. This study systematically evaluates state-of-the-art models, including OpenAI GPT-4.5 Preview, GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-3.5 Turbo, GPT-o1, GPT-o3 Mini, Google’s Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0 Flash, Gemini 2.0 Flash Lite, Anthropic’s Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3.7 Sonnet, and Meta’s LLaMA 3.0 8B Instruct, LLaMA 3.1 8B Instruct. These models were tested on ten Python programming tasks—five simple and five complex—and evaluated using an embedding-based semantic similarity approach. High-performing models such as GPT-4.5 Preview and GPT-4o Mini consistently produced accurate outputs, while LLaMA 3.1 8B Instruct performed weakest. Interestingly, complex tasks yielded higher similarity scores, likely due to their structured outputs. The results highlight the need for complementary metrics beyond semantic similarity, including execution correctness and efficiency. This study offers practical insights into AI-assisted coding and points toward future research directions for improving generative models in real-world applications.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.557 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.447 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.944 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.