Weakly supervised object detection in remote sensing image (RSI) is still a challenge because of the lack of instance-level labels, and many existing methods have two problems. Firstly, most of the existing methods usually mine the pseudo ground truth (PGT) instances solely relying on proposal class scores (PCS). Actually, the reliability of PCS is not enough because of the bird’s eye view imaging and large-scale chaotic background of RSIs, and the instances with high PCS incline to cover the discriminative region rather than the whole object. Secondly, the existing methods assign a one-hot label to each instance, and the label of PGT instance is copied to its neighbor instances, which induces the misclassification problem to some extent. Actually, the probability that the neighbor instances contain the object with the same category is smaller than the PGT instance. For the first problem, the proposal quality score (PQS) is proposed for mining high-quality PGT instances, which contains PCS and dual-context projection score (DCPS). The DCPS is calculated through semantic segmentation, and is employed to measure the completeness that each proposal covers an object. For the second problem, a pseudo soft label assignment (PSLA) strategy is proposed to assign more precise soft label for each instance, where the soft label is determined by the spatial distance between each instance and its nearest PGT instance. The ablation study validates the effectiveness of the PQS and PSLA. The comprehensive comparisons with other WSOD methods on three popular benchmarks show the excellent performance of our method.
Read full abstract