• We exposed the flaws of the two-stage object detection method. • A novel two stage object detection paradigm is presented. • An ablation study illustrates the effect of different elements of the framework. • State-of-the-art methods are outperformed in both accuracy and parameters. Modeling feature interaction patterns is of significant importance to object detection tasks. However, reasoning about the relationship between instance features is challenging in two-stage detectors due to the overuse of hand-crafted components. To tackle this problem, we analyze three different levels of feature interaction relationships, namely, the dependency relationship between the cropped local features and global features, the feature autocorrelation within the instance, and the cross-correlation relationship between the instances. To this end, we propose a more c ompact o bject d etector h ead network ( CODH ), which can not only preserve global context information and condense the information density, but also allows instance-wise feature enhancement and relational reasoning in a larger matrix space. Without bells and whistles, our method can effectively improve the detection performance while significantly reducing the parameters of the model, e.g. , with our method, the parameters of the head network is 0.6 × smaller than the state-of-the-art Cascade R-CNN. Yet, the performance boost is 1.3% on COCO test-dev. Without losing generality, we can also build a lighter head network for other multi-stage detectors by assembling our method, including Faster R-CNN, Libra R-CNN, and Double Head R-CNN.
Read full abstract