Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation

2026·0 Zitationen·ComputersOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection. In this paper, we study SteadyEval, a guardrailed exam-grading pipeline in which an adversarially trained LoRA filter (SteadyEval-7B-deep) preprocesses student answers to remove answer-side prompt injection, after which the original Mistral-7B-Instruct rubric-guided grader assigns the final score. We build two exam-grading pipelines on top of Mistral-7B-Instruct: a baseline pipeline that scores student answers directly, and a guardrailed pipeline in which a LoRA-based filter (SteadyEval-7B-deep) first removes injection content from the answer and a downstream grader then assigns the final score. Using two rubric-guided short-answer datasets in machine learning and computer networking, we generate grouped families of clean answers and four classes of answer-side attacks, and we evaluate the impact of these attacks on score shifts, attack success rates, stability across prompt variants, and alignment with human graders. On the pooled dataset, answer-side attacks inflate grades in the unguarded baseline by an average of about +1.2 points on a 1–10 scale, and substantially increase score dispersion across prompt variants. The guardrailed pipeline largely removes this systematic grade inflation and reduces instability for many items, especially in the machine-learning exam, while keeping mean absolute error with respect to human reference scores in a similar range to the unguarded baseline on clean answers, with a conservative shift in networking that motivates per-course calibration. Chief-panel comparisons further show that the guardrailed pipeline tracks human grading more closely on machine-learning items, but tends to under-score networking answers. These findings are best interpreted as a proof-of-concept guardrail and require per-course validation and calibration before operational use.

Autoren

Institutionen

"Dunarea de Jos" University of Galati(RO)

Themen

Adversarial Robustness in Machine LearningTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen