Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Role of Prompt Engineering in AI Essay Scoring: A Comparative Analysis of ChatGPT's Scoring Stability Across Varying Prompt Designs
0
Zitationen
3
Autoren
2026
Jahr
Abstract
The rapid adoption of large language models (LLMs), including ChatGPT, in educational contexts has renewed interest in their potential use for automated essay scoring (AES). While prior studies report moderate to strong agreement between ChatGPT and human raters, the role of prompt engineering in shaping scoring reliability and validity remains insufficiently examined. This study investigates how different prompt designs influence the consistency and human alignment of ChatGPT-based essay scoring. Using a stratified sample of 100 learner essays from the ICNALE corpus, each essay was evaluated under four prompt conditions with increasing levels of instructional structure. Scoring outcomes were analyzed using descriptive statistics, intraclass correlation coefficients (ICC), repeated-measures ANOVA, and error-based metrics. The results reveal statistically significant differences across prompt conditions, with rubric-aligned prompts yielding substantially lower score variability and the highest agreement with human ratings (average-measure ICC <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$>0.92)$</tex>. Distributional analyses further demonstrate that structured prompts effectively constrain stochastic scoring behavior. These findings provide empirical evidence that prompt design is a critical methodological factor in LLM-based AES and that reliability limitations commonly attributed to ChatGPT can be mitigated through principled prompt engineering. The study offers practical guidance for the responsible deployment of AI-assisted assessment systems and contributes to the growing literature on prompt-sensitive behavior in educational applications of LLMs.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.