Abstract

Object proposal algorithms have been demonstrated to be very successful in accelerating object detection process. High object localization quality and detection recall can be obtained using thousands of proposals. However, the performance with a small number of proposals is still unsatisfactory. This paper demonstrates that the performance of a few proposals can be significantly improved with the minimal human interaction—a single touch point. To this end, we first generate hierarchical superpixels using an efficient tree-organized structure as our initial object proposals, and then select only a few proposals from them by learning an effective Convolutional neural network for objectness ranking. We explore and design an architecture to integrate human interaction with the global information of the whole image for objectness scoring, which is able to significantly improve the performance with a minimum number of object proposals. Extensive experiments show the proposed method outperforms all the state-of-the-art methods for locating the meaningful object with the touch point constraint. Furthermore, the proposed method is extended for video. By combining with the novel interactive motion segmentation cue for generating hierarchical superpixels, the performance on a single proposal is satisfactory and can be used in the interactive vision systems, such as selecting the input of a real-time tracking system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.