T‐Skeleton: Accurate scene text detection via instance‐aware skeleton embedding

Haiyan Li,Xingfei Hu,Hongtao Lu

doi:10.1049/ipr2.13043

Abstract

AbstractExisting segmentation‐based methods have made considerable progress in arbitrarily shaped text detection due to the advantage of dealing with shape variation. However, there still exist challenges to detecting accurate text instances with dense layouts, inaccurate annotations, and complex backgrounds. Many recent works have focused on improving arbitrary boundary prediction, but it may be difficult to accurately distinguish each instance of dense layouts because their boundary pixels may be mistakenly classified to produce inaccurate results (i.e., adhesive texts) with inaccurate annotation and complex backgrounds. Considering the local and long‐range dependencies, this paper proposes an efficient text detector, namely T‐Skeleton, to obtain more reliable segmentation detections. In the spirit of object skeletonization, we introduce the text instance skeleton highlighting the semantically significant structure (similar to the skeleton of a fish) to explicitly capture the long‐range dependencies of text instances. The key idea of T‐Skeleton is to calibrate the coarse text proposals by embedding text instance skeletons to separate crowd texts accurately and robustly. We further design a channel attention module to enlarge the performance margin between T‐Skeleton and the segmentation baseline. Experimental results on four publicly available datasets show the superiority of T‐Skeleton in handling long and curved texts.

Full Text