Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
VmambaIR: Visual State Space Model for Image Restoration
112
Zitationen
8
Autoren
2025
Jahr
Abstract
Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range dependencies. DMs require large prior models and computationally intensive denoising steps. Transformers have powerful modeling capabilities but face challenges due to quadratic complexity with input image size. To tackle these challenges, we propose VmambaIR, one of the first works to introduce State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks. Specifically, we utilize a Unet architecture to stack our proposed Omni Selective Scan (OSS) blocks, consisting of an OSS module and an Efficient Feed-Forward Network (EFFN). Our proposed omni selective scan mechanism overcomes the unidirectional modeling limitation of SSMs by efficiently modeling image information flows in all six directions to better exploit surrounding restoration information. Furthermore, we conducted a comprehensive evaluation of our VmambaIR across multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that our proposed VmambaIR achieves state-of-the-art (SOTA) performance with much fewer computational resources and parameters. Our research highlights the potential of state space models as promising alternatives to the transformer and CNN architectures in serving as foundational frameworks for next-generation low-level visual tasks.
Ähnliche Arbeiten
A Computational Approach to Edge Detection
1986 · 28.910 Zit.
Compressed sensing
2006 · 23.003 Zit.
Pattern Recognition and Machine Learning
2007 · 22.075 Zit.
A theory for multiresolution signal decomposition: the wavelet representation
1989 · 20.957 Zit.
Reducing the Dimensionality of Data with Neural Networks
2006 · 20.737 Zit.