Parallel Spatio-Temporal Attention Transformer for Video Frame Interpolation

Xin Ning,Youdong Ding,Feifan Cai,Youdong Ding,Yuhang Li

doi:10.3390/electronics13101981

Abstract

Traditional video frame interpolation methods based on deep convolutional neural networks face challenges in handling large motions. Their performance is limited by the fact that convolutional operations cannot directly integrate the rich temporal and spatial information of inter-frame pixels, and these methods rely heavily on additional inputs such as optical flow to model motion. To address this issue, we develop a novel framework for video frame interpolation that uses Transformer to efficiently model the long-range similarity of inter-frame pixels. Furthermore, to effectively aggregate spatio-temporal features, we design a novel attention mechanism divided into temporal attention and spatial attention. Specifically, spatial attention is used to aggregate intra-frame information, integrating both attention and convolution paradigms through the simple mapping approach. Temporal attention is used to model the similarity of pixels on the timeline. This design achieves parallel processing of these two types of information without extra computational cost, aggregating information in the space–time dimension. In addition, we introduce a context extraction network and multi-scale prediction frame synthesis network to further optimize the performance of the Transformer. Our method and state-of-the-art methods are extensively quantitatively and qualitatively experimented on various benchmark datasets. On the Vimeo90K and UCF101 datasets, our model achieves improvements of 0.09 dB and 0.01 dB in the PSNR metrics over UPR-Net-large, respectively. On the Vimeo90K dataset, our model outperforms FLAVR by 0.07 dB, with only 40.56% of its parameters. The qualitative results show that for complex and large-motion scenes, our method generates sharper and more realistic edges and details.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel Spatio-Temporal Attention Transformer for Video Frame Interpolation

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: May 18, 2024
License type: CC BY 4.0

Similar Papers

Action Video Games Make Dyslexic Children Read Better
Sandro Franceschini ... Andrea Facoetti
Current Biology | VOL. 23
Sandro Franceschini, et. al.Sandro Franceschini ... Andrea Facoetti
28 Feb 2013
Current Biology | VOL. 23

Attention systems and neural responses to visual and auditory stimuli: an fMRI study
Chunlin Li ... Jinglong Wu
-
Chunlin Li, et. al.Chunlin Li ... Jinglong Wu
01 May 2007
01 May 2007

Task demand mediates the interaction of spatial and temporal attention
Helena Palmieri ... Marisa Carrasco
Scientific Reports | VOL. 14
Helena Palmieri, et. al.Helena Palmieri ... Marisa Carrasco
22 Apr 2024
Scientific Reports | VOL. 14

Toward the influence of temporal attention on the selection of targets in a visual search task: An ERP study.
Bettina Rolke ... Freya Festl
Psychophysiology | VOL. 53
Bettina Rolke, et. al.Bettina Rolke ... Freya Festl
01 Aug 2016
Psychophysiology | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel Spatio-Temporal Attention Transformer for Video Frame Interpolation

Abstract

Talk to us

Similar Papers

More From: Electronics