Identification of Layout elements in Chinese academic papers based on Mask R-CNN

Ziyi Yang,Ning Li

doi:10.1109/iccece54139.2022.9712723

Abstract

In response to the need for automatic layout analysis of Chinese academic papers and in order to address the problem of incomplete analysis of existing layout elements, the present paper proposes a method of recognizing layout elements in Chinese academic papers based on Mask R-CNN. Firstly, data acquisition and manual annotation were conducted to construct a layout image data set of Chinese academic papers. Then a weighted anchor box generation mechanism was introduced based on the traditional Mask R-CNN model architecture to upgrade the RPN network. Finally, a layout element recognition model for layout images of Chinese academic papers was built. This approach has empirically proven to work for effective recognition and precise location of nine layout elements, such as headers, titles at different levels, main body, figures, tables, formulas and references in academic papers, with accuracy up to 89.3%. Consequently, the model can better satisfy the requirements of practical application scenarios, and has significant application value by providing a basis for document information extraction, layout reconstruction, quality evaluation and other applications.

Full Text