Articles published on Video restoration
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
109 Search results
Sort by Recency
- Research Article
- 10.1080/13682199.2026.2624235
- Feb 4, 2026
- The Imaging Science Journal
- Javed Aymat Husen Shaikh + 2 more
ABSTRACT High-quality video is a prerequisite for dependable analysis in surveillance, traffic monitoring, and autonomous navigation. Rain, fog, and haze mask scene structure and reduce detection, tracking, and recognition accuracy. These effects appear in both day and night footage, yet many prior methods are built for one setting and transfer poorly. We address this gap with a single framework that restores multi-weather videos across day and night. It has two parts: (i) the Dynamic Query Attention Transformer (DQAT) block, which uses dynamic convolutions to form adaptive queries and focus attention on informative regions; and (ii) the Temporal Alignment Block (TAB), which enforces flow-free temporal consistency using self-calibrated and deformable convolutions for alignment. Together, they recover fine detail while keeping frames stable under changing weather and illumination. Experiments on day and night multi-weather datasets show consistent gains over recent methods, with a compact design for practical deployment.
- Research Article
- 10.1109/tpami.2026.3668757
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Hao Li + 4 more
The key success of existing video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information that is usually achieved by a temporal propagation with alignment strategies. However, inaccurate alignment usually leads to significant artifacts that will be accumulated during propagation and thus affect video restoration. Moreover, only propagating the same timestep features forward or backward does not handle the videos with complex motion or occlusion. To address these issues, we propose a collaborative feedback discriminative (CFD) method to correct inaccurate aligned features and better model spatial and temporal information for VSR. Specifically, we first develop a discriminative alignment correction (DAC) method to reduce the influences of the artifacts caused by inaccurate alignment. Then, we propose a collaborative feedback propagation (CFP) module based on feedback and gating mechanisms to explore spatial and temporal information of different timestep features from forward and backward propagation simultaneously. Finally, we embed the proposed DAC and CFP into commonly used VSR networks to verify the effectiveness of our method. Experimental results demonstrate that our method improves the performance of existing VSR models while maintaining a lower model complexity.
- Research Article
- 10.1109/tip.2026.3655117
- Jan 1, 2026
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Pengfei Chen + 5 more
Video Quality Assessment (VQA) strives to computationally emulate human perceptual judgments and has garnered significant attention given its widespread applicability. However, existing methodologies face two primary impediments: (1) limited proficiency in evaluating samples at quality extremes (e.g., severely degraded or near-perfect videos), and (2) insufficient sensitivity to nuanced quality variations arising from a misalignment with human perceptual mechanisms. Although vision-language models offer promising semantic understanding, their reliance on visual encoders pre-trained for high-level tasks often compromises their sensitivity to low-level distortions. To surmount these challenges, we propose the Restoration-Assisted Multi-modality VQA (RAM-VQA) framework. Uniquely, our approach leverages video restoration as a proxy to explicitly model distortion-sensitive features. The framework operates through two synergistic stages: a prompt learning stage that constructs a quality-aware textual space using triple-level references (degraded, restored, and pristine) derived from the restoration process, and a dual-branch evaluation stage that integrates semantic cues with technical quality indicators via spatio-temporal differential analysis. Extensive experiments demonstrate that RAM-VQA achieves state-of-the-art performance across diverse benchmarks, exhibiting superior capability in handling extreme-quality content while ensuring robust generalization.
- Research Article
- 10.1109/jstsp.2025.3627128
- Dec 1, 2025
- IEEE Journal of Selected Topics in Signal Processing
- Lu Liu + 11 more
MoA-VR: A Mixture-of-Agents System Toward All-in-One Video Restoration
- Research Article
- 10.4018/jcit.393279
- Nov 18, 2025
- Journal of Cases on Information Technology
- Xiaohua Ge
In digital archiving and online dissemination, digital artworks often face problems such as low resolution, color distortion, and texture loss, which affect the viewing experience and artistic value exploration. This paper integrates a diffusion probability model with style preservation loss to create a digital art image reconstruction framework. Through a reversible diffusion process, it restores global details and uses an adaptive style encoder to maintain local stroke consistency. Experiments on various art datasets show that the method outperforms traditional algorithms in detail fidelity, subjective aesthetics, and cross-genre generalization, emphasizing both pixel-level accuracy and artistic expression integrity. The study provides a new technical path for digital art preservation and recreation, laying the foundation for deep generation models in cultural heritage protection. Its modular design facilitates adaptation to 3D and video restoration.
- Research Article
- 10.1016/j.neunet.2025.107700
- Nov 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Yuanshuo Cheng + 5 more
Degradation-Guided cross-consistent deep unfolding network for video restoration under diverse weathers.
- Research Article
- 10.54097/2sb9jw88
- Aug 29, 2025
- Journal of Computing and Electronic Information Management
- Maokang Sun
This paper proposes a simplified audio-guided video face restoration method. The goal is to recover high-quality, temporally consistent face videos. We designed a multi-stage framework that integrates audio and visual modalities through simple yet effective components. Specifically, we extract low-level HOG features from video frames and MFCC features from audio. We then utilize a simplified 3D convolutional network to predict dictionary indices guided by both modalities. A pre-trained TS-VQGAN decoder reconstructs high-quality frames. Simplified spatio-temporal fidelity modules and optical flow smoothing techniques are simultaneously applied to enhance spatio-temporal consistency. Experimental results on the VoxCeleb2 dataset demonstrate that our method outperforms single-modal methods such as BasicVSR++ and VQF in terms of PSNR, SSIM, and LPIPS metrics. This indicates that cross-modal fusion can still deliver consistent performance improvements in practical video restoration tasks even under a simplified structure.
- Research Article
- 10.35320/ij.168
- Aug 23, 2025
- International Association of Sound and Audiovisual Archives (IASA) Journal
- Andrea Walker + 1 more
When a wildfire ripped through the Jagger Library on 18 April 2021 at the University of Cape Town (UCT), firefighters engaged in strenuous efforts to control the flames. Within days of the fire, a formal disaster management project was initiated, led by UCT Libraries. While the Reading Room was destroyed by the fire, the basements housing Special Collections survived. Within days, the flooded basement was drained and teams of volunteers arranged by UCT Libraries were organised to extract the materials, carrying labelled crates to the surface. In the lowest basement underground, significant flooding occurred, and audiovisual materials below the water mark were impacted, including video and audio tapes, causing water damage and instances of mould infestation. Following extraction, the Audiovisual Digitisation Project (AVDP) was launched to mitigate the potential loss of data on already-fragile carriers. The project formed part of a broader, ongoing disaster recovery response to salvage and recover Special Collections surviving the Jagger Library fire. With nearly 35,000 unique items affected and a range of damage to the carriers, the project was expedited at a scale of operation unprecedented within UCT Libraries. With little time to plan, the team had to execute a comprehensive audit and digitisation project of all the materials exposed to moisture during the fire. While planning is a critical part of any project, the circumstances of the AVDP were unprecedented and required significant agility and dedication by the team involved. The digitisation itself was outsourced to a company known as Video Restoration Television (VRTV), with UCT Libraries working closely with the vendor to digitise these fragile holdings. This article outlines the processes involved in the successful completion of the AVDP, unique not only as a digitisation project but also as a disaster recovery project, which resulted in the comprehensive audit and digitisation of the entire audiovisual archive in Special Collections.
- Research Article
- 10.1016/j.dib.2025.111815
- Aug 1, 2025
- Data in brief
- Duc-Minh Nguyen + 3 more
ViCoW: A dataset for colorization and restoration of Vietnam War imagery.
- Research Article
- 10.1145/3727147
- Jun 30, 2025
- ACM Transactions on Multimedia Computing, Communications, and Applications
- Claudio Rota + 3 more
We present a novel Convolutional Neural Network that exploits the Laplacian decomposition technique, which is typically used in traditional image processing, to restore videos compressed with the High-Efficiency Video Coding (HEVC) algorithm. The proposed method decomposes the compressed frames into multi-scale frequency bands using the Laplacian decomposition, it restores each band using the ad-hoc designed Multi-frame Residual Laplacian Network (MRLN), and finally recomposes the restored bands to obtain the restored frames. By leveraging the multi-scale frequency representation of compressed frames provided by the Laplacian decomposition, MRLN can effectively reduce the compression artifacts and restore the image details with a reduced computational cost. In addition, our method can be easily instantiated in various versions to control the tradeoff between efficiency and effectiveness, representing a versatile solution for scenarios with constrained computational resources. Experimental results on the MFQEv2 benchmark dataset show that our method achieves the state-of-the-art performance in HEVC-compressed video restoration with a lower model complexity and shorter runtime with respect to existing methods. The project page is available at https://github.com/claudiom4sir/LaplacianVCAR .
- Research Article
- 10.31891/csit-2025-2-21
- Jun 26, 2025
- Computer systems and information technologies
- Mykola Maksymiv + 1 more
Video enhancement aims to restore high-quality video from degraded inputs affected by noise, blur, compression artifacts, or resolution loss. Most existing models assume a fixed degradation during training, limiting their robustness to real-world scenarios with unknown and varying distortions. In this paper, we propose a quality-aware video enhancement framework that explicitly estimates the input degradation level and conditions the restoration process accordingly. Our method consists of a lightweight degradation level estimation module that predicts a quality score for each frame, and a conditional enhancement network that dynamically adjusts restoration strength based on the estimated degradation. Unlike static models trained for a single degradation type, our system adapts to diverse distortions, applying appropriate enhancement strategies for different quality levels. Extensive experiments on standard datasets such as Vimeo-90K and REDS demonstrate that our method consistently outperforms strong baselines, including BasicVSR, EDVR, and others, particularly under blind degradations. The proposed framework improves PSNR, SSIM, and LPIPS scores, while maintaining temporal consistency and introducing only minor computational overhead. These results highlight the potential of explicit quality estimation for achieving robust and perceptually faithful video restoration across varying real-world conditions.
- Research Article
- 10.31926/but.mif.2025.5.67.2.16
- Jun 5, 2025
- Bulletin of the Transilvania University of Brasov. Series III: Mathematics and Computer Science
- T Barbu + 1 more
A spatio-temporal video filtering approach is proposed in this article. The video restoration technique introduced here removes successfully the signalindependent noises, such as the additive white Gaussian noise, that deteriorate the frame sequences. The proposed framework is based on a novel 3D fourth-order nonlinear reaction-diffusion model that reduces the additive white Gaussian noise (AWGN) considerably, overcomes the side-effects and deals properly with the inter-frame correlation problem. A rigorous mathematical treatment is performed on this model, its validity being investigated. A numerical approximation algorithm that solves this nonlinear partial differential equation (PDE)-based model is then provided in this paper and applied successfully in the video denoising tests that are also described here.
- Research Article
- 10.1016/j.mex.2025.103402
- Jun 1, 2025
- MethodsX
- Mohana Priya P + 1 more
Video prediction based on temporal aggregation and recurrent propagation for surveillance videos
- Research Article
1
- 10.1609/aaai.v39i2.32189
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
- Cong Cao + 3 more
Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various tasks of image restoration and enhancement. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on the pre-trained image diffusion model. By replacing the spatial self-attention layer with the proposed short-long-range (SLR) temporal attention layer, the pre-trained image diffusion model can take advantage of the temporal correlation between frames. We further propose temporal consistency guidance, spatial-temporal noise sharing, and an early stopping sampling strategy to improve temporally consistent sampling. Our method is a plug-and-play module that can be inserted into any diffusion-based image restoration or enhancement methods to further improve their performance. Experimental results demonstrate the superiority of our proposed method.
- Research Article
- 10.1609/aaai.v39i8.32944
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
- Lianxin Xie + 7 more
Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations. This task poses a significant challenge of managing temporal heterogeneity while at the same time maintaining stable face attributes. In this paper, we introduce a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge, and our model is referred to as DP-TempCoh. Specifically, we incorporate a spatial-temporal-aware content prediction module to synthesize high-quality content from discrete visual priors, conditioned on degraded video tokens. To further enhance the temporal coherence of the predicted content, a motion statistics modulation module is designed to adjust the content, based on discrete motion priors in terms of cross-frame mean and variance. As a result, the statistics of the predicted content can match with that of real videos over time. By performing extensive experiments, we verify the effectiveness of the design elements and demonstrate the superior performance of our DP-TempCoh in both synthetically and naturally degraded video restoration.
- Research Article
- 10.11591/ijai.v14.i2.pp1518-1530
- Apr 1, 2025
- IAES International Journal of Artificial Intelligence (IJ-AI)
- Redouane Lhiadi + 2 more
Video processing is essential in entertainment, surveillance, and communication. This research presents a strong framework that improves video clarity and decreases bitrate via advanced restoration and compression methods. The suggested framework merges various deep learning models such as super-resolution, deblurring, denoising, and frame interpolation, in addition to a competent compression model. Video frames are first compressed using the libx265 codec in order to reduce bitrate and storage needs. After compression, restoration techniques deal with issues like noise, blur, and loss of detail. The video restoration transformer (VRT) uses deep learning to greatly enhance video quality by reducing compression artifacts. The frame resolution is improved by the super-resolution model, motion blur is fixed by the deblurring model, and noise is reduced by the denoising model, resulting in clearer frames. Frame interpolation creates additional frames between existing frames to create a smoother video viewing experience. Experimental findings show that this system successfully improves video quality and decreases artifacts, providing better perceptual quality and fidelity. The real-time processing capabilities of the technology make it well-suited for use in video streaming, surveillance, and digital cinema.
- Research Article
- 10.29040/ijcis.v5i1.208
- Jan 30, 2025
- International Journal of Computer and Information System (IJCIS)
- Xuzhong Jia + 3 more
The preservation and enhancement of historical news video archives represent a critical challenge in digital archiving. This paper proposes a novel dual-stream approach leveraging modified conditional Generative Adversarial Networks (GANs) for simultaneous enhancement of video and audio quality in historical news footage. The framework incorporates parallel processing pathways for video and audio restoration, connected through an innovative feature fusion mechanism. The architecture introduces several key improvements, including temporal-aware processing modules, multi-scale discriminators, and adaptive feature fusion strategies. Comprehensive experiments conducted on a diverse dataset of historical news broadcasts from 1960-2000 demonstrate significant improvements over existing methods, achieving a 35.2% increase in Peak Signal-to-Noise Ratio (PSNR) and 29.8 dB improvement in audio Signal-to-Noise Ratio (SNR). The proposed framework maintains temporal coherence while preserving content authenticity, addressing critical challenges in archival media restoration. Quantitative evaluations show superior performance across multiple quality metrics, while qualitative assessments confirm enhanced perceptual quality and historical accuracy preservation. The experimental results validate the effectiveness of the dual-stream approach in historical news video restoration, establishing a new benchmark for automated archival media enhancement.
- Research Article
- 10.1109/tpami.2025.3616607
- Jan 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Hao Nan Sheng + 3 more
A common assumption in matrix completion (MC) and tensor completion (TC) is that the missing locations are sampled randomly. However, in real-world scenarios, the unobserved elements are often not arbitrarily located, and may concentrate within entire rows or columns. We refer to this missing mechanism as structural missingness, and traditional MC and TC schemes suffer from drastic degradation under these circumstances. This work addresses the challenge of restoring structural missingness by introducing a novel framework for simultaneously reconstructing multiple matrices, called multi-matrix completion (MMC). In MMC, tri-factorization across matrices captures the correlation between matrices, and Tikhonov regularization on each matrix exploits its correlation. This design enables MMC to efficiently handle both random and structural missingness. In addition, MMC is not affected by the smoothness along matrices which makes it suitable for a wider variety of data compared to Fourier transform based TC methods. The alternating direction method of multipliers is utilized to solve the resultant optimization problem. The global convergence of the algorithm is supported by comprehensive theoretical analyses. We demonstrate the versatility of MMC through extensive experiments in image and video restoration, and showcase its superior performance in comparison to traditional MC and TC methods.
- Research Article
1
- 10.1109/tip.2025.3534033
- Jan 1, 2025
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Junye Chen + 4 more
This paper aims to restore original background images in watermarked videos, overcoming challenges posed by traditional approaches that fail to handle the temporal dynamics and diverse watermark characteristics effectively. Our method introduces a unique framework that first "decouples" the extraction of prior knowledge-such as common-sense knowledge and residual background details-from the temporal modeling process, allowing for independent handling of background restoration and temporal consistency. Subsequently, it "couples" these extracted features by integrating them into the temporal modeling backbone of a video inpainting (VI) framework. This integration is facilitated by a specialized module, which includes an intrinsic background image prediction sub-module and a dual-branch frame embedding module, designed to reduce watermark interference and enhance the application of prior knowledge. Moreover, a frame-adaptive feature selection module dynamically adjusts the extraction of prior features based on the corruption level of each frame, ensuring their effective incorporation into the temporal processing. Extensive experiments on YouTube-VOS and DAVIS datasets validate our method's efficiency in watermark removal and background restoration, showing significant improvement over state-of-the-art techniques in visible image watermark removal, video restoration, and video inpainting.
- Research Article
- 10.1109/tip.2025.3642622
- Jan 1, 2025
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Wenming Weng + 4 more
Video restoration from low-resolution and low-frame-rate blurry sources remains challenging due to insufficient data priors. In this paper, we propose BVSR-EvD, leveraging event cameras and diffusion models to boost blurry video space-time super-resolution. Specifically, we identify three distinct data priors from event-video dual modalities: motion prior from events, content prior from videos, and physical prior from their integration, contributing to temporal stability, content preservation, and detail enhancement respectively. To effectively utilize these data priors, BVSR-EvD creates the Trident Diffusion Model (Trident-DM), which decomposes each denoising step into trident decoupling and adaptive self-composition stages. The former employs single-modal and dual-modal meta-networks to extract the three unique data priors, while the latter dynamically integrates them through learned prior-aware weight maps. BVSR-EvD achieves up to $\times 8$ spatial super-resolution and $\times 64$ temporal super-resolution from blurry videos, surpassing existing methods on public video datasets.