Abstract

Extracting the underlying dynamics of objects in image sequences is one of the challenging problems in computer vision. Besides, dynamic mode decomposition (DMD) has recently attracted attention as a method for obtaining modal representations of nonlinear dynamics from general multivariate time-series data without explicit prior information about the dynamics. In this paper, we propose a convolutional autoencoder (CAE)-based DMD (CAE-DMD) to perform accurate modeling of underlying dynamics in videos. We develop a modified CAE model that encodes images to latent vectors and incorporated DMD on the latent vectors to extract DMD modes. These modes are split into background and foreground modes for foreground modeling in videos, or used for video classification tasks. And the latent vectors are mapped so as to recover the input image sequences through a decoder. We perform the network training in an end-to-end manner, i.e., by minimizing the mean square error between the original and reconstructed images. As a result, we obtain accurate extraction of underlying dynamic information in the videos. We empirically investigate the performance of CAE-DMD in two applications background foreground extraction and video classification on synthetic and publicly available datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.