Abstract
To better solve scale variance problem, deep multi-scale methods usually detect objects of different scales by different in-network layers. However, the semantic levels of features from different layers are usually inconsistent. In this paper, we propose a multi-branch and high-level semantic network by gradually splitting a base network into multiple different branches. As a result, the different branches have same depth and the output features of different branches have similarly high-level semantics. Due to the difference of receptive fields, the different branches are suitable to detect objects of different scales. Meanwhile, the multi-branch network does not introduce additional parameters by sharing the convolutional weights of different branches. To further improve detection performance, skip-layer connections are used to add context to the branch of relatively small receptive field, and dilated convolution is incorporated to enlarge the resolutions of output feature maps. When they are embedded into Faster RCNN architecture, the weighted scores of proposal generation network and proposal classification network are further proposed. Experiments on three pedestrian datasets (i.e., the KITTI dataset, the Caltech dataset, and the Citypersons dataset), one face dataset (i.e., the WIDER FACE dataset), and two general object datasets (i.e., the COCO benchmark and the PASCAL VOC dataset) demonstrate the effectiveness and generality of proposed method. On these datasets, our method achieves state-of-the-art performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.