Abstract

Non-maximum suppression (NMS) is a post-processing step in almost every visual object detector. Its goal is to drastically prune the number of overlapping detected candidate regions-of-interest (ROIs) and replace them with a single, more spatially accurate detection. The default algorithm (Greedy NMS) is fairly simple and suffers from drawbacks, due to its need for manual tuning. Recently, NMS has been improved using deep neural networks that learn how to solve a spatial overlap-based detections rescoring task in a supervised manner, where only ROI coordinates are exploited as input. In this paper, neural NMS performance is augmented by feeding the network additional information extracted from the appearance of each candidate ROI. This information captures statistical properties regarding the spatial distribution of interest-points detected within the corresponding image region. Thus, the deviation in 2D distribution between the interest-points detected inside a ROI that encloses the actual object entirely, and within one that only captures it partially, is exploited as a discriminant factor, with the NMS network being implicitly forced to also learn how to solve an additional, appearance-based binary classification task (complete vs partial object silhouettes). The empirical evaluation on three public person detection datasets leads to state-of-the-art results, at a small computational overhead.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.