Salient Guided Text Detection in E-Commerce Images
Text detection is a fundamental task in computer vision that involves identifying and locating text within images or videos. It has been the subject of extensive research, with numerous approaches primarily tailored for open-scene text, but there are limited studies dedicated to practical industries such as e-commerce. E-commerce images are designed to capture human attention, and effective text detection can amplify this marketing strategy. Yet, identifying text in e-commerce images poses particular challenges due to their distinct visual attributes, which set them apart from openscene images. Therefore, this paper aims to address this gap by exploring how human attention can aid text detection on e-commerce images. The proposed model merges high-level text features with low-level and saliency features and exploits both local and semantic characteristics of image regions. Leveraging visual cues, low-level and saliency features aid in predicting the saliency map, which is then employed to aid text detection. The proposed method achieves better localization of text, outperforming current state-of-the-art models on the benchmark e-commerce SalECI dataset. The code for this study is available at https://github.com/bebbieyin/SalientTextDet.