Abstract

Text detection in complex scenes is very hard realize by the diversification of text distribution, direction, and typesetting. This paper proposes one scene text detection method with end-to-end structure with parallel backbone network and region segmentation. With multiple deformable convolutions and extracting features of multi-dimensional text regions, multiple candidate regions of different sizes are generated and corresponding states are further given. Experiments show that compared with baseline, this method can further adapt to the problem that the different shapes and angles of the target in the image lead to the decrease of accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call