Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Applied Explainability for Large Language Models: A Comparative Study
0
Zitationen
1
Autoren
2026
Jahr
Abstract
Large Language Models (LLMs) achieve strong performance across natural language processing tasks, yet their internal decision processes remain difficult to interpret. This lack of transparency creates challenges in real-world deployments requiring trust, debugging, and accountability. This study presents a comparative analysis of three explainability techniques—Integrated Gradients, Attention Rollout, and SHAP—applied to a fine-tuned DistilBERT model on the SST-2 sentiment classification task. The methods are evaluated under a consistent experimental setup using qualitative criteria such as faithfulness, stability, and interpretability. The results show that gradient-based attribution methods provide the most stable and intuitive explanations, while attention-based approaches are computationally efficient but less aligned with prediction-relevant features. Model-agnostic methods offer flexibility but introduce computational overhead and variability. This work highlights practical trade-offs in explainability techniques and emphasizes the importance of evaluating them in realistic scenarios. The findings provide actionable insights for machine learning practitioners working with transformer-based NLP systems.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.463 Zit.
Generative Adversarial Nets
2023 · 19.843 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.259 Zit.
"Why Should I Trust You?"
2016 · 14.314 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.138 Zit.