Abstract

Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved success in ubiquitous areas of computation and applications. They were shown to be effective in modeling data with both temporal and spatial dependencies for translation or prediction tasks. In this study, we propose an embedding approach to visualize and interpret the representation of data by these models. Furthermore, we show that the embedding is an effective method for unsupervised learning and can be utilized to estimate the optimality of model training. In particular, we demonstrate that embedding space projections of the decoder states of RNN Seq2Seq model trained on sequences prediction are organized in clusters capturing similarities and differences in the dynamics of these sequences. Such performance corresponds to an unsupervised clustering of any spatio-temporal features and can be employed for time-dependent problems such as temporal segmentation, clustering of dynamic activity, self-supervised classification, action recognition, failure prediction, etc. We test and demonstrate the application of the embedding methodology to time-sequences of 3D human body poses. We show that the methodology provides a high-quality unsupervised categorization of movements. The source code with examples is available in a Github repository1.

Highlights

  • Recurrent Sequence to Sequence (Seq2Seq) network models use internal states to process sequences of inputs (Hochreiter and Schmidhuber, 1997; Cho et al, 2014; Sutskever et al, 2014; Luong et al, 2015)

  • Our results indicate that the Adversarial GeometryAware Encoder-Decoder (AGED) model (Gui et al, 2018), shown to be one of the state-of-art motion prediction model based on RNN Seq2Seq as the predictor, obtains the best prediction results

  • We propose a novel construction of interpretable embedding for the hidden states of the Seq2Seq model

Read more

Summary

Introduction

Recurrent Sequence to Sequence (Seq2Seq) network models use internal states to process sequences of inputs (Hochreiter and Schmidhuber, 1997; Cho et al, 2014; Sutskever et al, 2014; Luong et al, 2015). The speciality of Seq2Seq is that these models consist of encoder and decoder components. The encoder passes the last internal state to the decoder as an “initialization” of the decoder. With this information the decoder transforms or generates novel sequences with a similar distribution. Such an architecture and its variants, e.g., attention-based

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call