Abstract

CNN-based scene text detection methods have achieved superior results. They are mostly implemented on the architecture of full convolution networks and non-maximum suppression (NMS) which combines two tasks of text classification and localization. However, in the NMS procedure, most filter the bounding boxes according to the classification confidence. This makes appropriately those well-located text boxes suppressed during NMS. In this paper, we propose an intersection-over-union (IOU) network to predict the IOU between the bounding box and the matched ground-truth. Then, the predicted IOU as localization confidence will be fused with the classification confidence. Furthermore, in the NMS, the classification confidence is replaced by the fused confidence as the ranking standard to preserve the accurately located text boxes. We experimented on the ICDAR2011 and ICDAR2013 datasets, the results show that the method proposed in this paper can effectively improve the accuracy of text detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call