Abstract
Deep learning technology has improved the performance of vision-based action recognition algorithms, but such methods require a large number of labeled training datasets, resulting in weak universality. To address this issue, this paper proposes a novel self-deployable ubiquitous action recognition framework that enables a self-motivated user to bootstrap and deploy action recognition services, called FOLLOWER. Our main idea is to build a “fingerprint” library of actions based on a small number of user-defined sample action data. Then, we use the matching method to complete action recognition. The key step is how to construct a suitable “fingerprint”. Thus, a pose action normalized feature extraction method based on a three-dimensional pose sequence is designed. FOLLOWER is mainly composed of the guide process and follow the process. Guide process extracts pose action normalized feature and selects the inner class central feature to build a “fingerprint” library of actions. Follow process extracts the pose action normalized feature in the target video and uses the motion detection, action filtering, and adaptive weight offset template to identify the action in the video sequence. Finally, we collect an action video dataset with human pose annotation to research self-deployable action recognition and action recognition based on pose estimation. After experimenting on this dataset, the results show that FOLLOWER can effectively recognize the actions in the video sequence with recognition accuracy reaching 96.74%.
Highlights
Recognizing human actions can have many potential applications, including video surveillance, human–computer interfaces, sports video analysis, and video retrieval
Due to the restriction of video-based human pose estimation algorithms [21,22,23] and the lack of video action datasets containing human pose annotations [5], previous humanskeleton-based algorithms just rely on manually annotated data [5,11] or human skeleton data obtaining from expensive motion capture equipment such as Kinects and RGB-D cameras [10], unable to effectively recognize action based on monocular video cameras
Through the action filtering process based on key angle changes and the adaptive weight offset template matching process based on normalized joints Dynamic Time Warping (DTW) distance, FOLLOWER selects the class with the smallest distance from X as the predicted class within the action fingerprint library
Summary
Recognizing human actions can have many potential applications, including video surveillance, human–computer interfaces, sports video analysis, and video retrieval. To address the second challenge, we propose a similarity measure of action sequences called normalized joints Dynamic Time Warping (DTW)distance based on the dynamic time warping algorithm We use it to calculate the importance of each candidate in the action class and select the most representative central pose action normalized feature as the action feature to build an action fingerprint library, with a dynamic entry mechanism to support user-defined action expansion. Based on processing a small number of user-defined action video sequences with tags, FOLLOWER extracts guide pose action normalized feature and establishes an action fingerprint library based on normalized joints DTW. The middle pipeline in the figure shows the extraction process from video frames to pose action normalized features, mainly composed of the 3D human pose estimation and pose action normalized feature estimation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.