Abstract

In document analysis, text document images classification is a challenging task in several fields of application, such as archiving old documents, administrative procedures, or security. In this context, visual appearance has been widely used for document classification and considered as a useful and relevant features for the classification. However, visual information is insufficient to achieve higher classification rates, where relevant additional features, including textual features can be leveraged to improve classification results. In this paper, we propose a multi-view deep representation learning which allows combining textual and visual-based information respectively measured through the text and visual document images. The multi-view deep representation learning is designed to find a deeply shared representation between textual and visual features by fusing them into a joint latent space where a classifier model is trained to classify the document images. Our experimental results demonstrate the ability of the proposed model to outperform competitive approaches and to produce promising results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call