MS-LSTM: Exploring spatiotemporal multiscale representations in video prediction domain

Zhifeng Ma,Hao Zhang,Jie Liu

doi:10.1016/j.asoc.2023.110731

Abstract

The drastic variation of motion in spatial and temporal dimensions makes the video prediction task extremely challenging. Existing RNN models obtain higher performance by deepening or widening the model. They obtain the multi-scale features of the video only by stacking layers, which is inefficient and brings unbearable training costs (such as memory, FLOPs, and training time). Different from them, this paper proposes a spatiotemporal multi-scale model called MS-LSTM wholly from a multi-scale perspective. On the basis of stacked layers, MS-LSTM incorporates two additional efficient multi-scale designs to fully capture spatiotemporal context information. Concretely, we employ LSTMs with mirrored pyramid structures to construct spatial multi-scale representations and LSTMs with different convolution kernels to construct temporal multi-scale representations. We theoretically analyze the training cost and performance of MS-LSTM and its components. Detailed comparison experiments with twelve baseline models on four video datasets show that MS-LSTM has better performance but lower training costs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MS-LSTM: Exploring spatiotemporal multiscale representations in video prediction domain

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Journal: Applied Soft Computing	Publication Date: Aug 12, 2023
Citations: 3

Similar Papers

Spatial Cues Influence Time Estimations in Deaf Individuals.
Maria Bianca Amadeo ... Monica Gori
iScience | VOL. 19
Maria Bianca Amadeo, et. al.Maria Bianca Amadeo ... Monica Gori
31 Jul 2019
iScience | VOL. 19

Interrelations Between Temporal and Spatial Cognition: The Role of Modality-Specific Processing.
Jonna Loeffler ... Rouwen Cañal-Bruland
Frontiers in Psychology | VOL. 9
Jonna Loeffler, et. al.Jonna Loeffler ... Rouwen Cañal-Bruland
21 Dec 2018
Frontiers in Psychology | VOL. 9

Confidence-Based Feature Acquisition to Minimize Training and Test Costs
Marie Desjardins ... Kiri L Wagstaff
-
Marie Desjardins, et. al.Marie Desjardins ... Kiri L Wagstaff
29 Apr 2010
29 Apr 2010

Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning.
Won-Hyuk Choi ... Yong-Suk Choi
Sensors | VOL. 22
Won-Hyuk Choi, et. al.Won-Hyuk Choi ... Yong-Suk Choi
30 Apr 2022
Sensors | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MS-LSTM: Exploring spatiotemporal multiscale representations in video prediction domain

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing