Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection. In this paper, we study SteadyEval, a guardrailed exam-grading pipeline in which an adversarially trained LoRA filter (SteadyEval-7B-deep) preprocesses student answers to remove answer-side prompt injection, after which the original Mistral-7B-Instruct rubric-guided grader assigns the final score. We build two exam-grading pipelines on top of Mistral-7B-Instruct: a baseline pipeline that scores student answers directly, and a guardrailed pipeline in which a LoRA-based filter (SteadyEval-7B-deep) first removes injection content from the answer and a downstream grader then assigns the final score. Using two rubric-guided short-answer datasets in machine learning and computer networking, we generate grouped families of clean answers and four classes of answer-side attacks, and we evaluate the impact of these attacks on score shifts, attack success rates, stability across prompt variants, and alignment with human graders. On the pooled dataset, answer-side attacks inflate grades in the unguarded baseline by an average of about +1.2 points on a 1–10 scale, and substantially increase score dispersion across prompt variants. The guardrailed pipeline largely removes this systematic grade inflation and reduces instability for many items, especially in the machine-learning exam, while keeping mean absolute error with respect to human reference scores in a similar range to the unguarded baseline on clean answers, with a conservative shift in networking that motivates per-course calibration. Chief-panel comparisons further show that the guardrailed pipeline tracks human grading more closely on machine-learning items, but tends to under-score networking answers. These findings are best interpreted as a proof-of-concept guardrail and require per-course validation and calibration before operational use.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.396 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.505 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.400 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.334 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.524 Zit.