Abstract

Learning object detectors from weak image annotations is an important yet challenging problem. Many weakly supervised approaches formulate the task as a multiple instance learning problem, where each image is represented as a bag of instances. For predicting the score for each object that occurs in an image, existing MIL based approaches tend to select the instance that responds more strongly to a specific class, which, however, overlooks the contextual information. Besides, objects often exhibit dramatic variations such as scaling and transformations, which makes them hard to detect. In this paper, we propose the weakly supervised group mask network (WSGMN), which mainly has two distinctive properties: (i) it exploits the relations among regions to generate community instances, which contain context information and are robust to object variations. (ii) It generates a mask for each label group, and utilizes these masks to dynamically select the feature information of the most useful community instances for recognizing specific objects. Extensive experiments on several benchmark datasets demonstrate the effectiveness of WSGMN on the tasks of weakly supervised object detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call