Abstract

Object detection over a long-tailed large-scale dataset is practical, challenging, and comprehensively under-explored. Recently proposed methods mainly focus on eliminating the imbalanced classification problem. However, only a few attempts have been made to consider the quality of the predicted bounding boxes. Inspired by the observation of existing Cascade architecture, "detectors with specific IoU thresholds excel at different label frequencies of bounding boxes," this paper first pinpoints the issue in long-tailed distribution. A detector may predict inaccurate bounding boxes on the categories of fewer training data such that the corresponding extracted visual features could further degrade the classification accuracy. Thus, the predicted accuracy of bounding boxes becomes substantially different among categories in the long-tailed distribution. We introduce a Multi-Expert Cascade (MEC) framework that readjusts the weight of each category in the training process via a multi-expert loss. Furthermore, we leverage dynamic ensemble mechanisms at inference time to fully utilize expert detectors and achieve better performance. Extensive experiments on the recent long-tailed large vocabulary object detection dataset show that the proposed MEC framework significantly improves the performance of most widely-used detectors over various backbones on object detection and instance segmentation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call