OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.04.2026, 15:22

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

What characteristics make ChatGPT effective for software issue resolution? An empirical study of task, project, and conversational signals in GitHub issues

2025·0 Zitationen·Empirical Software EngineeringOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Abstract Conversational large-language models (LLMs), such as ChatGPT, are extensively used for issue resolution tasks, particularly for generating ideas to implement new features or resolve bugs. However, not all developer-LLM conversations are useful for effective issue resolution and it is still unknown what makes some of these conversations not helpful. In this paper, we analyze 686 developer-ChatGPT conversations shared within GitHub issue threads to identify characteristics that make these conversations effective for issue resolution. First, we empirically analyze the conversations and their corresponding issue threads to distinguish helpful from unhelpful conversations. We begin by categorizing the types of tasks developers seek help with (e.g., code generation , bug identification and fixing , test generation ), to better understand the scenarios in which ChatGPT is most effective. Next, we examine a wide range of conversational, project, and issue-related metrics to uncover statistically significant factors associated with helpful conversations. Finally, we identify common deficiencies in unhelpful ChatGPT responses to highlight areas that could inform the design of more effective developer-facing tools. We found that only 62% of the ChatGPT conversations were helpful for successful issue resolution. Among different tasks related to issue resolution, ChatGPT was most helpful in assisting with code generation, and tool/library/API recommendations, but struggled with generating code explanations. Our conversational metrics reveal that helpful conversations are shorter, more readable, and exhibit higher semantic and linguistic alignment. Our project metrics reveal that larger, more popular projects and experienced developers benefit more from ChatGPT’s assistance. Our issue metrics indicate that ChatGPT is more effective on simpler issues characterized by limited developer activity and faster resolution times. These typically involve well-scoped technical problems such as compilation errors and tool feature requests. In contrast, it performs less effectively on complex issues that demand deep project-specific understanding, such as system-level code debugging and refactoring. The most common deficiencies in unhelpful ChatGPT responses include incorrect information and lack of comprehensiveness. Our findings have wide implications including guiding developers on effective interaction strategies for issue resolution, informing the development of tools or frameworks to support optimal prompt design, and providing insights on fine-tuning LLMs for issue resolution tasks.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSoftware Engineering ResearchOnline Learning and Analytics
Volltext beim Verlag öffnen