Bottom-Up Scene Text Detection with Markov Clustering Networks

Zichuan Liu,Guosheng Lin,Wang Ling Goh

doi:10.1007/s11263-020-01298-y

Abstract

A novel detection framework named Markov Clustering Network (MCN) is proposed for fast and robust scene text detection. Different from the traditional top-down scene text detection approaches that inherit from the classic object detection, MCN detects scene text objects in a bottom-up manner. MCN predicts instance-level bounding boxes by firstly converting an image into a stochastic flow graph where Markov Clustering is performed based on the predicted stochastic flows. The stochastic flows encode the local correlation and semantic information of scene text objects. An object is modeled as strongly connected nodes by flows, which allows flexible and bottom-up detection for scale-varying and rotated text objects without prior knowledge of object size. The flow prediction is supported by the advanced Convolutional Neural Networks architectures and Position-aware spatial attention mechanism, which provides enhanced flow prediction by adaptively fusing spatial representations. The experimental evaluation on public benchmarks shows that our MCN method achieves the state-of-art performance on public benchmarks, especially in retrieving long and oriented texts.

Full Text