CommuSpotter: Scene Text Spotting with Multi-Task Communication

Liang Zhao,Greg Wilsbacher,Song Wang

doi:10.3390/app132312540

Abstract

Scene text spotting is a challenging multi-task modulation for locating and recognizing texts in complex scenes. Existing end-to-end text spotters generally adopt sequentially decoupled multi-tasks, consisting of text detection and text recognition modules. Although customized modules are designed to connect the tasks closely, there is no interaction among multiple tasks, resulting in compatible information loss for the overall text spotting. Moreover, the independent and sequential modulation is unidirectional, accumulating errors from early to later tasks. In this paper, we propose CommuSpotter, which enhances multi-task communication by explicitly and concurrently sharing compatible information in overall scene text spotting. To address task-specific inconsistencies, we propose a Conversation Mechanism (CM) to extract and exchange expertise in each specific task with others. Specifically, the detection task is rectified by the text recognition task to filter out duplicated results and false positives, while the text recognition task is corrected by the rectified text detection task to replenish missing characters and decrease non-text interruptions. Consequently, the communication compensates for interaction information and breaks the sequential pipeline of error propagation. In addition, we adopt text semantic segmentation in the text recognition task, which reduces the complex design of customized modules and corresponding extra annotations. Compared with state-of-the-art methods, experimental results show that our method achieves competitive results with computation efficiency.

Full Text