EDNets: Deep Feature Learning for Document Image Classification Based on Multi-view Encoder-Decoder Neural Networks

Akrem Sellami,Salvatore Tabbone

doi:10.1007/978-3-030-86337-1_22

Abstract

In document analysis, text document images classification is a challenging task in several fields of application, such as archiving old documents, administrative procedures, or security. In this context, visual appearance has been widely used for document classification and considered as a useful and relevant features for the classification. However, visual information is insufficient to achieve higher classification rates, where relevant additional features, including textual features can be leveraged to improve classification results. In this paper, we propose a multi-view deep representation learning which allows combining textual and visual-based information respectively measured through the text and visual document images. The multi-view deep representation learning is designed to find a deeply shared representation between textual and visual features by fusing them into a joint latent space where a classifier model is trained to classify the document images. Our experimental results demonstrate the ability of the proposed model to outperform competitive approaches and to produce promising results.

Full Text