Weakly supervised object detection (WSOD) has become a growing trend in remote sensing images (RSIs) analysis, which aims to train an object detector given only image-level annotations (i.e., categories of existing objects in the given image), reducing the dependency on expensive object-level annotations. However, an object detector trained on such weak annotations is susceptible to the issues of background interference and missing detection due to the complex backgrounds and densely arranged objects of RSIs. In this paper, we develop a novel Multi-view nOisy Learning framework, named MOL, to tackle the abovementioned problems, which consists of two sequential stages: reliable object discovery and progressive object mining. (i) In reliable object discovery, we formulate WSOD as a multiple instance learning problem to discover potential objects in the given image, which process inevitably suffers background interference due to a lack of accurate supervision signal. To this end, we take inspiration from noisy learning theory and propose a temporal consistency-based instance selection strategy for discovering reliable foreground objects, reducing the risk of background interference. (ii) In progressive object mining, we serve the observed reliable objects as the initial pseudo-labels for building an object detector and propose a novel multi-view object mining strategy to progressively mine neglected objects from multiple distinct views, alleviating the missing detection issue. In this way, a well-trained object detector is obtained, which can achieve satisfactory performance in RSIs. Experimental results on two public benchmarks demonstrate that our method outperforms previous state-of-the-art methods by a large margin of 13.97% mAP and 1.69% mAP on NWPU VHR-10.v2 and DIOR, respectively. The code is available at: https://github.com/GC-WSL/MOL.
Read full abstract