Abstract

Human activities in a scene are often monitored by human agents in order to recognize potential threats or danger to a human being. However, with the increasing amount of cameras used in public and private sectors, human activity recognition using purely human labour is not feasible. Novel human activity recognition algorithms need to be developed in order to automate the recognition process. Based on number of cameras available, human activity recognition can be divided either into single camera human activity recognition or multi-camera human activity recognition. In single camera human activity recognition, often it is required to categorize human activities in the scene into normal or abnormal human activity. This research problem requires features that can unambiguously characterize and represent the normal object behaviour and could identify any behaviour which deviates from this behaviour as an abnormal human activity. To address this issue, we proposed a feature based on spatio-temporal features of an image sequence in a single camera scenario. Extraction of spatio-temporal features are based on modified 3-dimensional Harris function. To test the proposed algorithm, spatio-temporal features of pedestrians walking in a street scene of University of California San Diego (UCSD) dataset are modelled as the normal human activity. This model is then used to differentiate abnormal activities in the scene such as biker, skaters etc. from normal human activity. Existing approaches such as optical flow and mixture of dynamic texture model are replicated to compare the performance of the proposed algorithm. Compared to other existing approaches, our proposed algorithm on UCSD dataset has shown competitive performance with 20 times faster processing speed. In multi-camera human activity recognition, generally there are multiple classes of human activities that are captured using multiple cameras. Compared to single camera human activity recognition, the human activities are observed from multiple views. The information from multiple views needs to be fused effectively in order to detect the human activity. Most existing methods rely on high correspondence between views and are of relatively high computational cost. To address this problem, a framework that utilizes the weighted sum of feature from all views is proposed. The videos are first converted to optical flow and then voting features are calculated for optical flow features. To combine the voting features from multiple views, weighted sum of the voting features is calculated. The weight factors are generated using three novel methods which are mean voting features, video labels and optimized weights. The weight factors for all three methods are then used to calculate a weighted sum feature vector which is used in multi-camera human activity recognition. Multi-view features calculated using optimized weights method yields the best result and hence this approach is further extended to incorporate additional feature types. In our work, we fused two types of features: optical flow and motion histogram volume features. Resulting human activity recognition system was tested on INRIA Xmas Motion Acquisition Sequences (IXMAS) dataset and achieves a high recognition rate of 94.85% which is higher than all reported results for this dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call