Abstract

To prevent the compilation of documents, many table documents are formatted with non-editable and non-structured texts such as PDFs or images. Quickly recognizing the contents of tables is still a challenge due to factors such as irregular formats, uneven text quality, and complex and diverse table content. This article proposes the UTTSR table recognition model, which consists of four parts: text region detection, text line detection and recognition, and table sequence recognition. For table detection, the Cascade Faster RCNN with the ResNeXt105 network is implemented, using TPS (Thin Plate Spline) transformation and affine transformation to correct the image and to improve accuracy. For text line detection, DBNET is used with Do-Conv in FPN (Feature Pyramid Networks) to speed up training. Text lines are recognized using CRNN without the CTC module, enhancing recognition performance. Table sequence recognition is based on the transformer combined with post-processing algorithms that fuse table structure sequences and unit grid content. Experimental results show that the UTTSR model outperforms the compared methods. This upgraded model significantly improves the accuracy of the previous state-of-the-art F1 score on complex tables, reaching 97.8%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.