Abstract

In real-world scenarios, data generally follows a long-tail distribution. Although a large number of works on long-tail object detection have emerged, they still suffer from an insufficient discriminative learning for tail categories. To alleviate this problem, we devise a novel Class-guided Triple Head Prediction Network (CTHNet). Considering the long-tail LVIS dataset contains frequent, common and rare classes, we propose a Triple Box Heads (TBH) to deal with these three classes, enhancing discriminative representations for all classes. To train such a TBH, we adopt a decoupled training strategy and introduce a novel Generic Class-Specific Loss (GCSL) to further adaptively protect rare class based on their learning status. Experiments indicate that our CTHNet not only is superior to existing methods, but also significantly improves the performance of one-stage and two-stage detectors. Specifically, our CTHNet improves RetinaNet and Faster R-CNN by 5.7% and 4% mAPs on LVIS 0.5, especially with the improvements of 12% and 8% APs for rare class, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call