This paper develops a human action recognition method for human silhouette sequences based on supervised temporal t-stochastic neighbor embedding (ST-tSNE) and incremental learning. Inspired by the SNE and its variants, ST-tSNE is proposed to learn the underlying relationship between action frames in a manifold, where the class label information and temporal information are introduced to well represent those frames from the same action class. As to the incremental learning, an important step for action recognition, we introduce three methods to perform the low-dimensional embedding of new data. Two of them are motivated by local methods, locally linear embedding and locality preserving projection. Those two techniques are proposed to learn explicit linear representations following the local neighbor relationship, and their effectiveness is investigated for preserving the intrinsic action structure. The rest one is based on manifold-oriented stochastic neighbor projection to find a linear projection from high-dimensional to low-dimensional space capturing the underlying pattern manifold. Extensive experimental results and comparisons with the state-of-the-art methods demonstrate the effectiveness and robustness of the proposed ST-tSNE and incremental learning methods in the human action silhouette analysis.
Read full abstract