DRFN: A unified framework for complex document layout analysis

Xingjiao Wu,Tianlong Ma,Xiangcheng Du,Ziling Hu,Jing Yang,Liang He

doi:10.1016/j.ipm.2023.103339

Abstract

Document layout analysis (DLA) plays a vital role in information processing and management. At this stage, the processing of non-Manhattan layout documents has become the bottleneck in implementing the universal document layout analysis framework. To address this challenge, we propose a Complex Document Semantic Structure Extraction non-Manhattan document layout dataset (CDSSE). Furthermore, we design a Dynamic Residual Feature fusion Network (DRFN) to integrate the feature differences between non-Manhattan layouts and Manhattan layouts. During the fusion process, the DRFN makes full use of low-dimensional information and maintains the integrity of high-level semantic information through a Dynamic Residual Fusion Block (DRF). To overcome model overfitting caused by data scarcity, we propose a novel Dynamic Selection Mechanism (DSM). We prove that the DRFN can achieve comparable results on all benchmark datasets. For the Manhattan layout document, F1 reached 89.5% on DSSE-200 and 95.1% on CS-150. For the non-Manhattan layout document, F1 reached 86.8% on CDSSE. In addition, we verified the effectiveness of the model structure. On all datasets, the performance of the model using DRF was significantly improved (DSSE-200: 76.6% vs. 80.3%, CS-150: 91.7% vs. 93.1%, 62.6% vs. 71.8%). The use of the DSM was also significantly improved (DSSE-200: 89.0% vs. 89.5%, CS-150: 94.3% vs. 95.1%, 84.8% vs. 86.8%).

Full Text