Abstract
Standard methods of subspace clustering are based on self-expressiveness in the original data space, which states that a data point in a subspace can be expressed as a linear combination of other points. However, the real data in raw form are usually not well aligned with the linear subspace model. Therefore, it is crucial to obtain a proper feature space for performing high quality subspace clustering. Inspired by the success of Convolutional Neural Networks (CNN) for extraction powerful features from visual data and the block diagonal prior for learning a good affinity matrix from self-expression coefficients, in this paper, we propose a jointly trainable feature extraction and affinity learning framework with the block diagonal prior, termed as Convolutional Subspace Clustering Network with Block Diagonal prior (ConvSCN-BD), in which we solve the joint optimization problem in ConvSCN-BD via an alternating minimization algorithm, which updates the parameters in the convolutional modules and the self-expression coefficients with stochastic gradients descent and updates other variables with close-form solutions alternatingly. In addition, we derive the connection between the block diagonal prior and the subspace structured norm, and reveal that using the block diagonal prior on the affinity matrix is essentially incorporating the feedback information from spectral clustering. Experiments on three benchmark datasets demonstrated the effectiveness of our proposal.
Highlights
In many problems across computer vision and pattern recognition, we need to deal with high-dimensional datasets, such as images, videos, text, and more
Block diagonal prior [17]–[20] means that the affinity matrix learned from data has a block diagonal structure, which consists of k connected components, corresponding to data points in k subspaces
We propose a jointly trainable framework for subspace clustering, in which the convolution feature extraction and the affinity learning with the block diagonal prior are jointly optimized
Summary
In many problems across computer vision and pattern recognition, we need to deal with high-dimensional datasets, such as images, videos, text, and more. Such high-dimensional data can often be well approximated by a union of lowdimensional subspaces, corresponding to multiple classes or categories [1]. The feature point trajectories associated with a rigidly moving object in a video lie in a union of subspaces with dimension up to 3 [3], and the images of each handwritten digit with different variations lie in a low-dimensional subspace [4]. Subspace clustering has found many applications in image representation and compression [5], motion segmentation [3], and temporal video segmentation [6], etc
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have