Scene text detection is crucial across numerous application fields. However, despite the emphasis on real-time performance in scene text detection, most existing detection models utilize the Feature Pyramid Network (FPN) for feature extraction, often disregarding its inherent limitations. Integrating high-resolution multi-channel features into FPN requires substantial computational resources. While FPN treats local and global features equally and is stable in various applications, its suitability for text-specific features is questionable. To this end, we propose the Asymmetric Center Positioning Network (ACP-Net) to replace FPN, achieving accuracy and real-time text detection in complex scenarios. ACP-Net features an asymmetric feature structure with independent branches for global and local information, along with an adaptive weighted fusion module to capture long-range dependencies effectively. In addition, a text center positioning module enhances text feature understanding by learning feature centers. Comprehensive evaluations across various terminals confirmed ACP-Net’s superior accuracy and speed.
Read full abstract