Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Lost in the middle? examining scoring reliability and position bias in LLM-based automated essay scoring
0
Zitationen
1
Autoren
2026
Jahr
Abstract
Abstract This study investigates position bias in ChatGPT’s scoring patterns for automated essay scoring, with a focus on primacy and recency effects. Position bias, originating from the serial position effect in cognitive psychology, refers to the tendency of Large Language Models (LLMs) to emphasize the introduction and conclusion of a text while potentially neglecting content in the middle sections. Using 192 synthetic essays across varying lengths and section qualities, this research explores whether ChatGPT disproportionately weighs the quality of introductions and conclusions compared to body paragraphs. Statistical analyses reveal that while ChatGPT successfully differentiates between strong and weak sections, no consistent evidence supports the presence of systematic primacy or recency effects in overall scoring. Domain-specific analyses further indicate that rubric categories such as grammar and mechanics are sensitive to errors throughout essays, while content and organization are more heavily influenced by body quality. The findings suggest that ChatGPT’s scoring patterns are largely balanced, with minimal signs of position bias, thereby enhancing the validity of its use for automated scoring. This research highlights the need for continued evaluation of AI-based grading systems to ensure fairness and reliability while proposing avenues for future exploration in LLM-driven assessments.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.758 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.666 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.220 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.896 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.