UTTSR: A Novel Non-Structured Text Table Recognition Model Powered by Deep Learning Technology

Min Li,Delong Han,Liping Zhang,Mingle Zhou

doi:10.3390/app13137556

Abstract

To prevent the compilation of documents, many table documents are formatted with non-editable and non-structured texts such as PDFs or images. Quickly recognizing the contents of tables is still a challenge due to factors such as irregular formats, uneven text quality, and complex and diverse table content. This article proposes the UTTSR table recognition model, which consists of four parts: text region detection, text line detection and recognition, and table sequence recognition. For table detection, the Cascade Faster RCNN with the ResNeXt105 network is implemented, using TPS (Thin Plate Spline) transformation and affine transformation to correct the image and to improve accuracy. For text line detection, DBNET is used with Do-Conv in FPN (Feature Pyramid Networks) to speed up training. Text lines are recognized using CRNN without the CTC module, enhancing recognition performance. Table sequence recognition is based on the transformer combined with post-processing algorithms that fuse table structure sequences and unit grid content. Experimental results show that the UTTSR model outperforms the compared methods. This upgraded model significantly improves the accuracy of the previous state-of-the-art F1 score on complex tables, reaching 97.8%.

Full Text