T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

Hajra Binte Naeem,Fiza Murtaza,Muhammad Haroon Yousaf,Sergio A Velastin

doi:10.1016/j.patrec.2021.04.023

Abstract

• We encode a long-term temporal structure of actions using single stream C3D features from short segments of video. • We include time-order information in encoding temporal sequence of complete action video, hence named temporal VLAD (T-VLAD). • T-VLAD timestamps action primitive motions in short segments of video, facilitating view invariant action recognition. • State-of-the-art results are shown on fixed setup multiview datasets, MuHAVi and IXMAS. • Proposed encoding scheme T-VLAD performs equally well on a dynamic background dataset, UCF-101. Robust view-invariant human action recognition (HAR) requires effective representation of its temporal structure in multi-view videos. This study explores a view-invariant action representation based on convolutional features. Action representation over long video segments is computationally expensive, whereas features in short video segments limit the temporal coverage locally. Previous methods are based on complex multi-stream deep convolutional feature maps extracted over short segments. To cope with this issue, a novel framework is proposed based on a temporal vector of locally aggregated descriptors (T-VLAD). T-VLAD encodes long term temporal structure of the video employing single stream convolutional features over short segments. A standard VLAD vector size is a multiple of its feature codebook size (256 is normally recommended). VLAD is modified to incorporate time-order information of segments, where the T-VLAD vector size is a multiple of its smaller time-order codebook size. Previous methods have not been extensively validated for view-variation. Results are validated in a challenging setup, where one view is used for testing and the remaining views are used for training. State-of-the-art results have been obtained on three multi-view datasets with fixed cameras, IXMAS, MuHAVi and MCAD. Also, the proposed encoding approach T-VLAD works equally well on a dynamic background dataset, UCF101.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: May 4, 2021
Citations: 13

Similar Papers

Internet-of-Things-Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security
Amjad Rehman ... Robertas Damaševičius
Security and Communication Networks | VOL. 2022
Amjad Rehman, et. al.Amjad Rehman ... Robertas Damaševičius
05 Oct 2022
Security and Communication Networks | VOL. 2022

Review of Literature on Human Activity Detection and Recognition
Pavankumar Naik ... R Srinivasa Rao Kunte
International Journal of Management, Technology, and Social Sciences | VOL. -
Pavankumar Naik, et. al.Pavankumar Naik ... R Srinivasa Rao Kunte
23 Nov 2023
International Journal of Management, Technology, and Social Sciences | VOL. -

Human Daily Activity and Fall Recognition Using a Smartphone’s Acceleration Sensor
Charikleia Chatzaki ... George Vavoulas
-
Charikleia Chatzaki, et. al.Charikleia Chatzaki ... George Vavoulas
01 Jan 2017
01 Jan 2017

Human Activity Recognition for Healthcare using Smartphones
Godwin Ogbuabor ... Robert La
-
Godwin Ogbuabor, et. al.Godwin Ogbuabor ... Robert La
26 Feb 2018
26 Feb 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters