Abstract

Ever-improving sensing technologies offer a fast and accurate collection of large-scale spatiotemporal data, recorded from multimodal sensors of heterogeneous natures, in various application domains, ranging from medicine and biology to robotics and traffic control. In this dissertation, we propose frameworks for learning the underlying representation of these data in an unsupervised manner, tailored towards several emerging applications, namely indoor navigation and mapping, neuroscience hypothesis testing, time series forecasting, 3D motion segmentation, and human action recognition. As such, (1) we developed an unsupervised framework for real-time depth and view-angle estimation from an inertially augmented video recorded from an indoor scene by employing geometric-based machine learning and deep learning models. (2) We introduced a hierarchical deep generative factor analysis framework for temporal modeling of neuroimaging datasets. Our model approximates high dimensional data by a product between time-dependent weights and spatially-dependent factors which are in turn represented in terms of lower dimensional latents. This framework can be extended to perform clustering in the low dimensional temporal latent or perform factor analysis in the presence of a control signal. (3) We developed a deep switching dynamical system for dynamical modeling of multidimensional time-series data. Specifically, we employ a deep vector auto-regressive latent model switched by a chain of discrete latents to capture higher-order multimodal latent dependencies. This results in a flexible model that (i) provides a collection of potentially interpretable states abstracted from the process dynamics, and (ii) performs short- and long-term vector time series prediction in a complex multi-relational setting. (4) We developed a dynamical deep generative latent model for segmentation of 3D pose data over time that parses the meaningful intrinsic states in the dynamics of these data and enables a low-level dynamical generation and segmentation of skeletal movements. Our model encodes highly correlated skeletal data into a set of few spatial basis of switching temporal processes in a low-dimensional latent framework. We extended this model for human action recognition by decoding from these low-dimensional latents to the motion data and their associated action labels.--Author's abstract

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.