Abstract

Documents are essential nowadays and present everywhere. In order to manage the vast amount of documents managed by companies, a first step consists in automatically determining the type of the document (its class). Even if automatic classification has been widely studied in the state of the art, the strongly imbalanced context and industrial constraints bring new challenges which were not studied till now: how to classify as many documents as possible with the highest precision, in an imbalanced context and with some classes missing during training?To this end, this paper proposes to study two different solutions to address these issues. The first is a multimodal neural network reinforced by an attention model and an adapted loss function that is able to classify a great variety of documents. The second is a combination method that uses a cascade of systems to offer a gradual solution for each issue. These two options provide good results as well in ideal context than in imbalanced context. This comparison outlines the limitations and the future challenges.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.