Abstract
Neuromorphic vision sensors provide low power sensing and capture salient spatial-temporal events. The majority of the existing neuromorphic sensing work focus on object detection. However, since they only record the events, they provide an efficient signal domain for privacy aware surveillance tasks. This paper explores how the neuromorphic vision sensor data streams can be analysed for human action recognition, which is a challenging application. The proposed method is based on handcrafted features. It consists of a pre-processing step for removing the noisy events followed by the extraction of handcrafted local and global feature vectors corresponding to the underlying human action. The local features are extracted considering a set of high-order descriptive statistics from the spatio-temporal events in a time window slice, while the global features are extracted by considering the frequencies of occurrences of the temporal event sequences. Then, low complexity classifiers, such as, support vector machines (SVM) and K-Nearest Neighbours (KNNs), are trained using these feature vectors. The proposed method evaluation uses three groups of datasets: Emulator-based, re-recording-based and native NVS-based. The proposed method has outperformed the existing methods in terms of human action recognition accuracy rates by 0.54%, 19.3%, and 25.61% for E-KTH, E-UCF11 and E-HMDB51 datasets, respectively. This paper also reports results for three further datasets: E-UCF50, R-UCF50, and N-Actions, which are reported for the first time for human action recognition on neuromorphic vision sensor domain.
Highlights
Neuromorphic vision sensing (NVS), known as dynamic vision sensing and event camera sensing, which has emerged recently, is capable of capturing fast spatio-temporal spikes in a scene with low power consumption [1]–[8]
Our proposed method consists of a pre-processing step followed by the generation of a feature vector to capture local and global features correspond to the underlying human action
The local features were extracted considering a set of high-order descriptive statistics from the spatio-temporal events in a time window slice, while the global features were extracted by considering the frequencies of occurrences of the temporal event sequences
Summary
Neuromorphic vision sensing (NVS), known as dynamic vision sensing and event camera sensing, which has emerged recently, is capable of capturing fast spatio-temporal spikes (changes) in a scene with low power consumption [1]–[8] Such data is of the form of a continuous stream of spatio-temporal events or spikes, as opposed to regularly uniformly spatio-temporal sampled values traditions imaging systems, as in active pixel sensing (APS). As NVS encodes the intensity change at each pixel and samples at non-uniform sampling rates, events with high frequency of occurrences correspond to high motion present in the scene, which is a solution for motion blurring due to high speed motion as often seen in conventional APS cameras Such a high motion response means that NVS based camera is regarded as a data-driven sensor since the output NVS depends on the magnitude of the apparent motion in the scene [23].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have