Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MAdaKron: A mixture-of-AdaKron adapters
0
Zitationen
3
Autoren
2025
Jahr
Abstract
• We design a new Adapter module to efficiently fine-tune a Pre-trained Language Model. • AdaKron achieves better performance than Full Fine-Tuning and original Adapters. • We combine AdaKron within a Mixture-of-Expert system to define MAdaKron. • Extensive evaluation on 18 NLP datasets shows the effectiveness of our approaches. • We perform an Ablation Study to evaluate the impact of each component on MAdaKron. Adapting pre-trained Large Language Models to specific tasks has traditionally involved updating all of their parameters. Nonetheless, this technique becomes impractical for models containing billions of parameters. This has led to intensive research on Parameter-Efficient Fine-Tuning (PEFT) techniques, which aim to train a small fraction of the model’s parameters while maintaining comparable performance to Full Fine-Tuning. A popular method is the Adapter, i.e. small trainable layers added to pre-trained models. Recently, we present AdaKron, an Adapter-based PEFT technique, which leverages the Kronecker product to combine the outputs of two small networks, training less than 0.55 % of the model’s parameters while outperforming Full Fine-Tuning. In this paper, we put forward a novel technique, called MAdaKron, a Mixture-of-AdaKron model, which combines AdaKron with a Mixture of Experts approach. MAdaKron combines the flexibility of a Mixture of Experts architecture with the efficiency given by AdaKron to further enhance its performance. We then extensively evaluate MAdaKron on eighteen Natural Language Understanding and Generation benchmarks, showing that it achieves performance on par or even better than recent state-of-the-art PEFT methods, while reducing the number of trainable parameters. These findings highlight MAdaKron as an efficient solution for Fine-Tuning LLMs, offering substantial computational cost reductions without losing performance.