Abstract

Technical movement analysis requires specialized domain knowledge and processing a large amount of data, and the advantages of AI in processing data can improve the efficiency of data analysis. In this paper, we propose a feature pyramid network-based temporal action detection (FPN-TAD) algorithm, which is used to solve the problem that the action proposal module has a low recall rate for small-scale temporal target action regions in the current video temporal action detection algorithm research. This paper is divided into three parts. The first part is an overview of the algorithm; the second part elaborates the network structure and the working principle of the FPN-TAD algorithm; and the third part gives the experimental results and analysis of the algorithm.

Highlights

  • In recent years, with the continuous development and application of computer technology and artificial intelligence technology, vision-based human motion analysis technology has been rapidly developed and widely paid attention to

  • E most central problem in motion analysis is human pose estimation, which is an important research task in the field of computer vision. e task of human pose estimation is to identify the human body and locate the position of the joints of human body parts through computer image processing algorithms, and to connect the positions of the joints to form the human skeleton according to the structure of the human body [4]

  • Our experiments show that the proposed method is effective, and the experimental process mainly verified the effectiveness of the two aspects of optimization proposed by feature pyramid network-based temporal action detection (FPN-TAD). e evaluation indicators of the results of the experiment include the AUC of the average recall (AR) curve to measure the performance of the sequential action proposal generation and the mean Average Precision (mAP) to evaluate the performance of the sequential action detection

Read more

Summary

Introduction

With the continuous development and application of computer technology and artificial intelligence technology, vision-based human motion analysis technology has been rapidly developed and widely paid attention to. To address the current problem of difficult detection of multiscale ground truth target actions caused by BSN of boundary-sensitive networks generating candidate proposals on fixed-size feature dimensions, we draw on the idea of FPN for prediction on multiscale feature maps and first obtain feature maps at multiple scales (corresponding to different resolutions of the same video) by FPN structure. On the ActivityNet1.3 and THUMOS-14 datasets, a significant performance improvement is obtained with respect to the preimprovement, reaching the current state-of-the-art level [13] In this part of the paper, the overall framework design of the proposed FPN-based multistage proposal generation temporal action detection algorithm (FPN-TAD) is given first, followed by a detailed description of the key techniques of video feature extraction, FPN action probability evaluation, predict predict (a) predict (b) predict (c). The FPN-TAD algorithm divides the temporal action detection task into four parts, the front video feature extraction network, the feature pyramid FPN, the temporal action proposal generation, and the action classification. e first part is the basic video feature extraction, which is mainly based on the representative structure TSN of the dual-stream method to extract the high-dimensional semantic feature representation of the input video, and it should be especially noted that the output of the dual-stream network is followed by a 2D temporal-channel convolution. e second part is the FPN module, which takes the input video feature representation and obtains a multiscale feature pyramid by 2D convolution operation and uses a top-down approach to fuse some of the small-scale feature maps as the input of the third part of the temporal action proposal module. e third part is the temporal action proposal generation module, including the temporal evaluation module (TEM), proposal evaluation module (PEM), and NMS postprocessing components. e temporal evaluation module performs onedimensional convolutional temporal evaluation in the temporal dimension of the input multiscale feature map and generates three probability distribution curves of action region, start position, and end position, which represent the probability of the current feature corresponding to the video region as action, action start, and action end, respectively, and such action probability distribution curves are obtained for feature maps of different scales. e fourth part is the action classification module, which takes the candidate features generated in the first part according to the proposal positions obtained in the third part, uses the TSN model to CNN (a) Snippet St

Proposal Generation
AveragePo oling
Method
Findings
DAPs SCNN

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.