Abstract

In order to solve the problem of algorithm robustness caused by scale change and imbalanced distribution of classes in the scene text detection task, we propose a new two-stream fusion framework TSFnet. It is constructed by the Detection Stream, the Judge Stream and the Merge output algorithm. In the Detection Stream, we propose a loss balance factor (LBF), which is used to optimize region proposal network. Then, the Regression-net and the Segmentation-net are used to predict text global segmentation map and its corresponding coordinates probability score. In Judge Stream, we use the feature pyramid network to extract the Judge map. In the process, the LBF is calculated to support the Detection Stream. Finally, we design a novel algorithm to fuse the outputs of the two-stream, and the precise position of the text is localized. Extensive experiments are conducted on the ICDAR 2015 and the ICDAR2017-MLT standard datasets. The results demonstrate that the framework performance is comparable with the sate-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call