Abstract

Non-Textual images like charts and tables are unlike natural images in various aspects, including high inter-class similarities, low intra-class similarities, substantial textual component proportions, and lower resolutions. This paper proposes a novel Multi-Dilated Context Aggregation based Dense Network (MDCADNet) addressing the multi-resolution and larger receptive field modeling need for the non-textual component classification task. MDCADNet includes a densely connected convolutional network for the feature map computation as front-end with a multi-dilated Backend Context Module (BCM). The proposed BCM generates multi-scale features and provides a systematic context aggregation of both low and high-level feature maps through its densely connected layers. Additionally, the controlled multi-dilation scheme offers a more extensive scale range for better prediction performance. A thorough quantitative evaluation has been performed on seven benchmark datasets for demonstrating the generalization capability of MDCADNet. Experimental results show MDCADNet performs consistently better than the state-of-the-art models across all datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.