OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 03.05.2026, 02:08

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Generative AI Models for Code Generation Tasks Using Embedding-Based Semantic Similarity

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2025

Jahr

Abstract

Generative artificial intelligence (AI) is rapidly transforming software development, especially in code generation. Large Language Models (LLMs) show strong potential for automating programming tasks, though their performance varies with task complexity. This study systematically evaluates state-of-the-art models, including OpenAI GPT-4.5 Preview, GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-3.5 Turbo, GPT-o1, GPT-o3 Mini, Google’s Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0 Flash, Gemini 2.0 Flash Lite, Anthropic’s Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3.7 Sonnet, and Meta’s LLaMA 3.0 8B Instruct, LLaMA 3.1 8B Instruct. These models were tested on ten Python programming tasks—five simple and five complex—and evaluated using an embedding-based semantic similarity approach. High-performing models such as GPT-4.5 Preview and GPT-4o Mini consistently produced accurate outputs, while LLaMA 3.1 8B Instruct performed weakest. Interestingly, complex tasks yielded higher similarity scores, likely due to their structured outputs. The results highlight the need for complementary metrics beyond semantic similarity, including execution correctness and efficiency. This study offers practical insights into AI-assisted coding and points toward future research directions for improving generative models in real-world applications.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSoftware Engineering ResearchMachine Learning in Materials Science
Volltext beim Verlag öffnen