Abstract

Deep convolutional neural networks (CNNs) are successful in self-extracting features for video object detection. The deep features and shallow features extracted from CNN are different. The shallow features have low-level semantic information, while the deep features contain high-level semantic information. In this paper, we propose an effective feature fusion method: Multi-level feature aggregation (MFA), which connects the output layer of each stage to the input layer of other stages and combines the output of each stage at the last layer of the network. This architecture can effectively combine shallow features and deep features to enhance the ability of expressing features and recognition accuracy. MFA is a flexible and end-to-end network. In addition, our experiments prove that MFA achieves significant accuracy on DET and VID datasets on object detection, and our method achieves mAP on DET and VID.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.