Abstract

Recently, deep learning has been successfully applied to object detection and localization tasks in images. When setting up deep learning frameworks for supervised training with large datasets, strongly labeling the objects facilitates good performance; however, the complexity of the image scene and large size of the dataset make this a laborious task. Hence, it is of paramount importance that the expensive work associated with the tasks involving strong labeling, such as bounding box annotation, is reduced. In this paper, we propose a method to perform object localization tasks without bounding box annotation in the training process by means of employing a two-path activation-map-based classifier framework. In particular, we develop an activation-map-based framework to judicially control the attention map in the perception branch by adding a two-feature extractor so that better attention weights can be distributed to induce improved performance. The experimental results indicate that our method surpasses the performance of the existing deep learning models based on weakly supervised object localization. The experimental results show that the proposed method achieves the best performance, with 75.21% Top-1 classification accuracy and 55.15% Top-1 localization accuracy on the CUB-200-2011 dataset.

Highlights

  • Sub-optimal localization problems: using thetheCAM [1][1]method wings,and andlegs legsare arenot notlocallocalized. birds’ tails, tails, wings, ized

  • Deep learning has been successfully applied to object detection and localization tasks in images

  • In situations involving the detection of an object in the field of object detection, when an image is given as an input to a deep learning model, humans are able to understand which part the model saw in the given image to produce the result

Read more

Summary

Sub-optimal localization problems: using thethe

CAM [1][1]method wings,and andlegs legsare arenot notlocallocalized. birds’ tails, tails, wings, ized. In order of to solve this problem, a previous [8]input proposed approach theproposed application of the regional dropout method to the input localization data, and several papersHowever,. Have various methods to solve the sub-optimal problem. 11] havethis proposed various methods to solve thetosub-optimal localization problem. Our proposed is sub-optimal problem with greater efficiency than the existing method. The number of learning parameters is minimized without an additional layer to solve the sub-optimal problem of WSOL, but the proposed method shows better performance than the existing methods. Some of the existing studies had to inject the same input several times to obtain the attention result for one image at the inference stage, but our method can obtain the attention with only one input.

Class Activation Map
Attention Mechanism
Weakly Supervised Object Localization
Proposed WSOL Method with Adjusted Weights
Experiments
Result
Method
Discussion and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call