Abstract
In recent years, research on multi-object behavior recognition methods based on pose estimation has made progress. However, in dense scenes, multi-object behavior recognition faces problems such as heavy occlusion, many small-scale objects, and an unbalanced distribution of object classes. Meanwhile, multi-object behavior recognition methods based on pose estimation are difficult to adapt to dense scenes. Therefore, we use the idea of object detection to complete the multi-object behavior recognition in dense scenes. However, compared with general object detection, the fine-grained classification and localization constrain each other due to the similarity of object classes, resulting in poor recognition performance. To solve the above problems, we propose a shallow convolutional neural networks (CNNs) module to extract task features separately in localization and classification task parallel branch structure, so that the two vision tasks avoid mutual constraints. Additionally, we propose a feature fusion mechanism to improve the detection performance of small objects by obtaining multi-scale high-level semantic information. To extract balanced and robust features, we exploit the number of objects in an image to generate dynamic class weights and difficulty weights. Experimental results on dense crowds in student classrooms demonstrated that the proposed method achieves superior performance compared to other state-of-the-art methods, with mAP50 of 88.72% and APs of 35.41%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.