Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes.

Masatoshi Nagano,Ichiro Kobayashi,Daichi Mochihashi,Tomoaki Nakamura,Takayuki Nagai

doi:10.3389/frobt.2022.903450

Abstract

In this study, HcVGH, a method that learns spatio-temporal categories by segmenting first-person-view (FPV) videos captured by mobile robots, is proposed. Humans perceive continuous high-dimensional information by dividing and categorizing it into significant segments. This unsupervised segmentation capability is considered important for mobile robots to learn spatial knowledge. The proposed HcVGH combines a convolutional variational autoencoder (cVAE) with HVGH, a past method, which follows the hierarchical Dirichlet process-variational autoencoder-Gaussian process-hidden semi-Markov model comprising deep generative and statistical models. In the experiment, FPV videos of an agent were used in a simulated maze environment. FPV videos contain spatial information, and spatial knowledge can be learned by segmenting them. Using the FPV-video dataset, the segmentation performance of the proposed model was compared with previous models: HVGH and hierarchical recurrent state space model. The average segmentation F-measure achieved by HcVGH was 0.77; therefore, HcVGH outperformed the baseline methods. Furthermore, the experimental results showed that the parameters that represent the movability of the maze environment can be learned.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in robotics and AI	Publication Date: Sep 30, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes.

Abstract

Talk to us

Similar Papers

More From: Frontiers in robotics and AI

Lead the way for us

Similar Papers

Co-display Content Service for First-Person Videos of Smart Glass
Bokyung Sung ... Ilju Ko
-
Bokyung Sung, et. al.Bokyung Sung ... Ilju Ko
23 Nov 2016
23 Nov 2016

Feasibility Study for Pose Estimation of Small UAS in Known 3D Environment Using Geometric Hashing
Costas Armenakis ... Julien Li-Chee-Ming
Photogrammetric Engineering & Remote Sensing | VOL. 80
Costas Armenakis, et. al.Costas Armenakis ... Julien Li-Chee-Ming
01 Dec 2014
Photogrammetric Engineering & Remote Sensing | VOL. 80

Combining deep generative and discriminative models for Bayesian semi-supervised learning
Jonathan Gordon ... José Miguel Hernández-Lobato
Pattern Recognition | VOL. 100
Jonathan Gordon, et. al.Jonathan Gordon ... José Miguel Hernández-Lobato
14 Dec 2019
Pattern Recognition | VOL. 100

Social Activity Measurement by Counting Faces Captured in First-Person View Lifelogging Video
Akane Okuno ... Yasuyuki Sumi
-
Akane Okuno, et. al.Akane Okuno ... Yasuyuki Sumi
11 Mar 2019
11 Mar 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes.

Abstract

Talk to us

Similar Papers

More From: Frontiers in robotics and AI