Abstract

Human activity recognition is an important and challenging topic for computer vision research community. Action representation using deep learning models are currently the dominant technique compared with other methods. However, supervised convolutional neural networks require large computational and memory resources to optimize their parameters. Recently, a simple unsupervised deep learning architecture Principal Component Analysis Network (PCANet) has emerged as an alternative of Convolutional Neural Networks (CNNs) and has significant accomplishments in various vision applications. Meanwhile, encoding and representation techniques using Bag of Words (BoW) and Vector of Locally Aggregated Descriptors (VLAD) have demonstrated great success for several visual tasks specifically in activity recognition. This work presents a novel human activity recognition technique by combining global and local features of PCANet with BoW and VLAD encoding schemes. Both global and local features are learned by PCANet utilizing selected frames from each action video. After that the dimensionality of these features is decreased via Whitening PCA (WPCA). Then encoding schemes are applied on both features to represent the final descriptors for each action. Ultimately, Support Vector Machines classifier (SVM) is trained for recognition process. Several experiments are conducted on UCF sports dataset to evaluate our method. All experimental results utilizing leave-one-out-cross validation (LOOCV) strategy are satisfactorv and comparable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call