The importance of wild video based image set recognition is monotonically increasing due to the large amount of video data being collected by various devices including surveillance cameras, drive recorders, smart phones, and internet. The content of these videos is often complex, and it raises the question of how to perform image set modeling and feature extraction for image set-based classification. In recent years, image set classification methods have advanced considerably by modeling the image set in terms of a covariance matrix, linear subspace, or Gaussian distribution. Moreover, the distinctive geometry spanned by them include Symmetric Positive Definite (SPD) manifold, Grassmannian manifold, and Gaussian embedded Riemannian manifold, respectively. As a matter of fact, most of the approaches just adopt a single geometric model to describe each given image set, which may lose information useful for classification. To tackle this problem, we propose a novel algorithm to model each image set from a multi-geometric perspective. Specifically, the covariance matrix, linear subspace, and Gaussian distribution are applied to set representation simultaneously. In order to fuse these multiple heterogeneous Riemannian manifold-valued features, the well-equipped Riemannian kernel functions are first employed to map them into high dimensional Hilbert spaces. Then, a multi-kernel metric learning framework is devised to embed the learned hybrid kernels into a common lower dimensional subspace to facilitate classification. We conduct experiments on six widely used datasets each representing a different classification task: video-based face recognition, set-based object categorization, video-based emotion recognition, dynamic scene classification, set-based cell identification, and 3D hand pose estimation, to evaluate the classification performance of the proposed algorithm. The extensive experimental results confirm its superiority over the state-of-the-art methods.
Read full abstract