Abstract
Inspired by the recent successes of attention based encoder-decoder (AED) approach on image captioning, machine translation, we present an AED model as an end-to-end recognition system for recognizing Japanese historical documents. The recognition system has two main modules: a dense convolution neural network for extracting features, and a Long Shor Term Memory (LSTM) decoder integrating with attention model for generating target text. We can train the model end-to-end. The model requires only input text line images and corresponding output characters. Therefore, we don't need annotations for characters and save a lot of time for making annotations. We also present a method to generate artificial text lines to solve the imbalance problem of the current annotated database. The results of experiments on the annotated and artificial databases demonstrate the effectiveness of the text line generation. Our recognition system achieved Character Error Rate of 23.76% and 22.52% by training with and without artificial text lines, respectively. Moreover, our recognition system outperforms the CNN-LSTM system, which achieved the state-of-art results in other document recognition tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.