MDCADNet: Multi dilated & context aggregated dense network for non-textual components classification in digital documents

Mandhatya Singh,Puneet Goyal

doi:10.1016/j.eswa.2022.116588

Abstract

Non-Textual images like charts and tables are unlike natural images in various aspects, including high inter-class similarities, low intra-class similarities, substantial textual component proportions, and lower resolutions. This paper proposes a novel Multi-Dilated Context Aggregation based Dense Network (MDCADNet) addressing the multi-resolution and larger receptive field modeling need for the non-textual component classification task. MDCADNet includes a densely connected convolutional network for the feature map computation as front-end with a multi-dilated Backend Context Module (BCM). The proposed BCM generates multi-scale features and provides a systematic context aggregation of both low and high-level feature maps through its densely connected layers. Additionally, the controlled multi-dilation scheme offers a more extensive scale range for better prediction performance. A thorough quantitative evaluation has been performed on seven benchmark datasets for demonstrating the generalization capability of MDCADNet. Experimental results show MDCADNet performs consistently better than the state-of-the-art models across all datasets.

Full Text