Abstract
Action recognition in videos is receiving extensive research interest due to its wide applications. This task needs to assign a specific action class for each video. In this paper, we study the problem of action similarity labeling (ASLAN) that is to verify whether two action videos present the same type of action or not. We show that both Fisher vector (FV) and vector of locally aggregated descriptors (VLAD) with dense trajectory features can achieve state-of-the-art performance on the ASLAN benchmark. Our main contribution is to develop a large margin dimensionality reduction (LMDR) method to compress high-dimensional FV and VLAD. Specially, we leverage the hinge loss objective function and stochastic gradient descent to optimize the discriminative projection matrix of these vectors. Extensive experiments on the ASLAN dataset indicate that our LMDR method not only reduces the dimension significantly but also improves the verification performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have