IoU-Related Arbitrary Shape Text Scoring Detector

Fagui Liu,Dian Gu,Cheng Chen

doi:10.1109/access.2019.2959018

Abstract

As a medium of information transmission, text is widely existed in natural scenes, showing diversity in orientation, scale, font, color and shape. Accurate detection of scene text is a prerequisite for subsequent recognition. Though many previous methods have worked well on horizontal and multi-oriented text detection datasets, detecting arbitrary shape scene text still remains as a challenging problem. To solve the problem, this paper proposes an arbitrary shape scene text detection method. Based on Mask R-CNN, our method replaces the original l 1 -smooth loss with the proposed IoU-related loss and adds a text scoring branch to align the confidence score with the text mask IoU to make the model highly relevant to IoU, achieving the goal of improving detection performance by improving IoU directly in a simple but effective way. The proposed method is evaluated on four public datasets: CTW-1500, Total-Text, ICDAR2015 and ICDAR2017-RCTW. For curved text detection datasets CTW-1500 and Total-Text, we have reached 79.2% and 81.1% H-mean respectively, showing that the proposed method has achieved competitive performance in arbitrary scene text detection.

Highlights

Scene text detection is an important branch of object detection, an attracting task in computer vision, as it can be widely used in many applications as image search, automatic transmission, blind person assistance, real-time translation and so on [1], [2]
Some Faster R-CNN based methods [13]–[15] achieve competitive performance on multi-oriented datasets, but are not ideal when dealing with arbitrary shape text detection challenge
To overcome the shortcomings mentioned above, we propose an arbitrary shape scene text detection method based on generalized object detection structure Mask R-CNN [16]

Summary

INTRODUCTION

Scene text detection is an important branch of object detection, an attracting task in computer vision, as it can be widely used in many applications as image search, automatic transmission, blind person assistance, real-time translation and so on [1], [2]. Some Faster R-CNN based methods [13]–[15] achieve competitive performance on multi-oriented datasets, but are not ideal when dealing with arbitrary shape text detection challenge. They can detect most of the text instances, there remains needless overlap and redundant background noise, which are very harmful to detecting performance. For text detection task, the ultimate goal of improving evaluation scores is to improve the Intersection-over-Union(IoU) between proposals and corresponding ground truth These methods abovementioned focus more on changing the scale, aspect ratio, rotation angle and shape of anchor, or changing the network framework, or introducing new modules to achieve better scores, while there still remains two problems.

RELATED WORKS

GENERALIZED COMPLETENESS-AWARE IoU LOSS

7: Finding the coordinate of smallest enclosing box Bc: x1c yc1

TEXT SCORING BRANCH

Findings

CONCLUSION AND FUTURE WORKS