Abstract

Temporal Action Detection is a research hotspot in Video Understanding. It is necessary not only to give the specific moments of the beginning and end of each action instance in the video, but also to give the category of the action instance. At present, most of the methods in temporal action detection are divided into two steps: The video is divided into a series of video clips to determine whether each clip is an action instance, and the fragments that are not an action instance are deleted; Segments that may be action instances are then classified to get the final result. At present, the difficulties in action detection research mainly include the following two points: 1) The boundary is not clear. Different from action recognition, action detection requires precise positioning, but an action in life is often not very definite. It is also the reason why the mean average precision (mAP) of action detection is low at present. 2) The time span is large. In life, an action often spans a very long scale. Short movements such as waving can last for a few seconds, while long movements such as rock climbing or cycling can last for tens of minutes, which makes it extremely difficult for us to extract proposals. In view of the above difficulties, this paper proposes a large receptive field boundary matching network(LRFBMN) model which takes advantage of the relationship between proposals to improve the accuracy of proposal generation. The model is mainly divided into two parts: 1) Clipping feature map is processed by large kernel convolution, and then proposal feature is generated by ROI-pooling; 2) The proposals are arranged in a certain order to form a fixed graph, and the information exchange between graph nodes is realized by using convolution with dilation. Through experiments, this model is 2.35% higher than baseline and 1.06% higher than state-of-the-art in THUMOS14 data set.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.