Abstract

Inspired by instance segmentation algorithms, researchers have proposed quantity of segmentation-based methods for text detection, achieving remarkable results on scene text with arbitrary orientation and large aspect ratios. Following their success, we believe cascade architecture and extracting contextual information in multiple aspects are powerful to boost performance on the basis of segmentation-based methods, especially in decreasing false positive texts in complex natural scene. Based on such consideration, we propose a multiple-context-aware and cascade CNN structure, which appropriately encodes multiple categories of context information into a cascade R-CNN framework. Specifically, the proposed method consists of two stages, i.e., feature generation and cascade detection. During the first stage, we define ISTK (Isolated Selective Text Kernel) module to refine feature map, which sequentially encodes channel-wise and kernel-size attention information by designing multiple branches and different kernel sizes in isolate form. Afterwards, we build long-range spatial dependencies in feature map via non-local operations. Built on contextual feature map, Cascade Mask R-CNN structure progressively refines accurate boundaries of text instances with multi-stage framework. We conduct comparative experiments on ICDAR2015 and 2017-MLT datasets, where the proposed method outperform comparative methods in terms of effectiveness and efficiency measurements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call