Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comparative Evaluation of ChatGPT, DeepSeek and Gemini in Automatic Unit Test Generation: a Success Rate Analysis
0
Zitationen
5
Autoren
2025
Jahr
Abstract
The advancement of large-scale language models (LLMs) has opened up new possibilities for automating unit test generation, a traditionally manual and expensive task. This quantitative study evaluates the performance of three LLMs-ChatGPT 4o mini, DeepSeek v3, and Gemini 2.5 Flash Pro-in generating test cases for methods in C# developed in Unity. The execution success rate of the generated tests was measured using real and synthetic data. The synthetic data was intentionally created to represent common structures, while the real data came from existing project functions. The experimental design was controlled and included the factors LLM and data type and the blocks cyclomatic complexity and contextual memory with four replicates per combination, for a total of 96 experimental treatments. The results show that LLMs have a high potential to support the automatic generation of unit tests. Furthermore, it was evidenced that the choice of model has a significant effect on the success rate of the generated tests. These findings provide useful initial evidence to guide the selection and use of LLMs in test automation processes within software development environments