Global remote feature modulation end-to-end detection

Xiaoan Bao,Wenjing Yi,Xiaomei Tu,Na Zhang,Qingqi Zhang,Yuting Jin,Biao Wu

doi:10.1038/s41598-024-68500-w

Abstract

Object detector based on fully convolutional network achieves excellent performance. However, existing detection algorithms still face challenges such as low detection accuracy in dense scenes and issues with occlusion of dense targets. To address these two challenges, we propose an Global Remote Feature Modulation End-to-End (GRFME2E) detection algorithm. In the feature extraction phase of our algorithm, we introduces the Concentric Attention Feature Pyramid Network (CAFPN). The CAFPN captures direction-aware and position-sensitive information, as well as global remote dependencies of features in deep layers by combining Coordinate Attention and Multilayer Perceptron. These features are used to modulate the front-end shallow features, enhancing inter-layer feature adjustment to obtain comprehensive and distinctive feature representations.In the detector part, we introduce the Two-Stage Detection Head (TS Head). This head employs the First-One-to-Few (F-O2F) module to detect slightly or unobstructed objects. Additionally, it uses masks to suppress already detected instances, and then feeds them to the Second-One-to-Few (S-O2F) module to identify those that are heavily occluded. The results from both detection stages are merged to produce the final output, ensuring the detection of objects whether they are slightly obscured, unobstructed, or heavily occluded. Experimental results on the pig detection dataset demonstrate that our GRFME2E achieves an accuracy of 98.4%. In addition, more extensive experimental results show that on the CrowdHuman dataset, our GRFME2E achieves 91.8% and outperforms other methods.

Full Text