Abstract

Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being monitored among a group of normal people and then identified based on their abnormal activities. Our goal is to render a multi class abnormal action detection in individuals as well as in groups from video sequences to differentiate multiple abnormal human actions. In this paper, You Look only Once (YOLO) network is utilized as a backbone CNN model. For training the CNN model, we constructed a large dataset of patient videos by labeling each frame with a set of patient actions and the patient’s positions. We retrained the back-bone CNN model with 23,040 labeled images of patient’s actions for 32 epochs. Across each frame, the proposed model allocated a unique confidence score and action label for video sequences by finding the recurrent action label. The present study shows that the accuracy of abnormal action recognition is 96.8%. Our proposed approach differentiated abnormal actions with improved F1-Score of 89.2% which is higher than state-of-the-art techniques. The results indicate that the proposed framework can be beneficial to hospitals and elder care homes for patient monitoring.

Highlights

  • In recent years, action recognition has gained a lot of focus within the arena of video analysis technology [1]

  • We obtained improved performance compared with previous work using extensive dataset

  • We are faced with the following challenges:. Activity recognition systems such as the video sequences are most often perceived from random camera viewpoints; the outputs of the system need to be invariant from heterogeneous camera viewpoints

Read more

Summary

Introduction

Action recognition has gained a lot of focus within the arena of video analysis technology [1]. Rapid advancement in smart gadgets and deep learning techniques have led recent growth in various applications for activity recognition. Many computer vision and machine learning based approaches have emerged with the aim to develop improved human recognition models [3,4]. Due to the outstanding performance of CNN [8] in image classification and object detection, many researchers have started to deploy CNN for video classification with some modification [6] Approaches such as Region-based Convolutional Neural Network (R-CNN) use region proposal methods to first generate potential bounding boxes in an image and run a classifier on these proposed boxes [9]. A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes It trains on full images and directly optimizes detection performance [12].

Literature Review
Overall Flow
Back-Bone CNN Model
Dataset
Testing
Implementation Details
Performance Metrics
Training and Testing
Comparison with Previous Work
Limitations and Future Work
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.