Learning Invariances for High-Dimensional Data Analysis

Mahsa Baktashmotlagh

doi:10.14264/uql.2014.183

Abstract

Dimensionality reduction has emerged as one of the prominent fields of research since it provides a solution to a wide class of problems, such as compression, classification, regression, feature analysis, and visual recognition. Generally, subspace learning algorithms find a low-dimensional subspace from a given high-dimensional data, wherein samples from different classes can be well separated. In the past decades, several types of dimensionality reduction and subspace selection algorithms have been widely used for visual data analysis. Conventional algorithms such as principal component analysis (PCA) and linear discriminant analysis (LDA), perform well under the assumption that training and test data follow similar distributions. However, the main drawback still remains in the sense that the distribution of the training data and that of the test data are mismatched when samples are drawn from different but related sources. They also fail to incorporate temporal statistical variations in data distribution when applied to time-series data(videos). Consequently they result in unsatisfactory recognition performance when they deal with real world visual analysis problems. To overcome the above issues, in this research, our aim is to consider invariance and stationarity in subspace analysis pertaining to many computer vision applications such as video retrieval, human behaviour analysis, event analysis and activity recognition. More specifically, we will address the challenging and fundamental tasks of 1. Video Classification (Scene, dynamic texture and action recognition) 2. Visual Domain Adaptation. To address the first task, we propose a subspace learning approach that focuses on extracting the stationary parts of all videos in the same class. The notion of stationarity is intuitively well-adapted to model the temporal nature of the video signal and lets us make use of many image features. Instead of modeling temporal information in the features, our method explicitly accounts for it when learning the latent space. As a consequence, the resulting video representation is particularly well-suited for classification purpose. Our experimental evaluation shows that our approach outperforms baselines from the different groups of methods in several video classification tasks such as dynamic texture recognition, scene classification and action recognition.The second task addressed by this work is domain adaptation for visual recognition. In our proposed approach, we follow the intuitive idea of learning the domain invariant subspace by matching the distributions of the transformed training and test data using Maximum Mean Discrepancy (MMD). This, we believe, makes better use of the expressive power of the kernel in MMD compared to the other approaches like Transfer Component Analysis (TCA). Although motivated by MMD, in TCA, the distance between the sample means is measured in a lower-dimensional space rather than in Reproducing Kernel Hilbert Space (RKHS), which somewhat contradicts the intuition behind the use of kernels. Furthermore, we extend the framework for the semi-supervised scenario. Experiments on benchmark domain adaptation dataset for visual recognition, show that in comparison to well-known methods, the proposed approach obtains a significant improvement in classification accuracy. To address the second task, we propose another domain adaptation method that exploits the Riemannian structure of the statistical manifold when learning the invariant subspaces or samples. To this end, we introduce the use of the Hellinger distance which is related to the geodesic distance on the space of probability distributions. While the Hellinger distance has been employed for dimensionality reduction, to the best of our knowledge, our approach is the first attempt at exploiting the Riemannian geometry of the statistical manifold for domain adaptation. We showed that our sample-selection and subspace-based approaches in conjunction with Hellinger distance for distribution matching consistently outperform the similar approaches on the tasks of visual domain adaptation and WiFi localization.

Full Text