A Human-Inspired Recognition System for Pre-Modern Japanese Historical Documents

Anh Duc Le,Tarin Clanuwat,Asanobu Kitamoto

doi:10.1109/access.2019.2924449

Abstract

Recognition of historical documents is a challenging problem due to the noised, damaged characters, and background. However, in Japanese historical documents, not only contains the mentioned problems, pre-modern Japanese characters were written in cursive and are connected. Therefore, character segmentation-based methods do not work well. This leads to the idea of creating a new recognition system. In this paper, we propose a human-inspired document reading system to recognize multiple lines of pre-modern Japanese historical documents. During the reading, people employ eyes movement to determine the start of a text line. Then, they move the eyes from the current character/word to the next character/word. They can also determine the end of a line or skip a figure to move to the next line. The eyes movement integrates with visual processing to operate the reading process in the brain. We employ attention-based encoder–decoder to implement this recognition system. First, the recognition system detects were to start a text line. Second, the system scans and recognize character by character until the text line is completed. Then, the system continues to detect the start of the next text line. This process is repeated until reading the whole document. As results, the system is successful to recognize multiple lines, connected and cursive characters without performing character/line segmentation. Besides, we also employ a coverage model which stores the history of eyes movement to predict the next movement more precisely. We tested our human-inspired recognition system on the pre-modern Japanese historical document provided by the PRMU Kuzushiji competition. The results of the experiments demonstrate the superiority and effectiveness of our proposed system by achieving Sequence Error Rate of 9.87% and 53.81% on level 2 and level 3 of the dataset, respectively. These results outperform to any other systems participated in the PRMU Kuzushiji competition.

Highlights

Through the development of human civilizations, writing systems have been changed over time in every language
Our proposed human-inspired recognition system outperforms the best system provided by Nguyen et al (DCRN system is the winner for the Kuzushiji competition)
Our system simulates the human reading behavior which is able to determine the start character of a text line, scan the character and determine the end of a text line to move to the text line

Summary

INTRODUCTION

Through the development of human civilizations, writing systems have been changed over time in every language. China changed traditional Chinese characters to simplified Chinese characters in 1964 These countries have been suffered from the problem of translating past knowledge written in historical resources to current languages. The reading module helps users to learn how to read Kuzushiji characters from real classical texts. They developed three recognition systems based on convolutional neural network (CNN) and Bidirectional Long Short-Term Memory (BLSTM) for three tasks [5]. The traditional recognition system has a good performance on printed and handwritten documents They are still insufficient for historical documents like Kuzushiji documents. People read documents by scanning characters from line to line and from word to word They are able to skip figures or spaces and move to the line. The system is trained end-to-end, which yield better performance than the previous systems trained separately

RELATED WORKS

PROBLEM DEFINITION

DATASET

EXPERIMENTS

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Human-Inspired Recognition System for Pre-Modern Japanese Historical Documents

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Recognition of Japanese historical text lines by an attention-based encoder-decoder and text line generation
Anh Duc Le ... Hideki Mima
-
Anh Duc Le, et. al.Anh Duc Le ... Hideki Mima
20 Sep 2019
20 Sep 2019

Arabic Handwriting Text Offline Recognition Using the HMM Toolkit (HTK)
Hicham El Moubtahij ... Khalid Satori
International Review on Computers and Software (IRECOS) | VOL. 9
Hicham El Moubtahij, et. al.Hicham El Moubtahij ... Khalid Satori
31 Jul 2014
International Review on Computers and Software (IRECOS) | VOL. 9

Text line script identification for a tri-lingual document
Prakash K Aithal ... N V Krishnamoorthi M Subbareddy
-
Prakash K Aithal, et. al.Prakash K Aithal ... N V Krishnamoorthi M Subbareddy
01 Jul 2010
01 Jul 2010

Text Line Identification from a Multilingual Document
P.A Vijaya ... M.C Padma
-
P.A Vijaya, et. al.P.A Vijaya ... M.C Padma
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Human-Inspired Recognition System for Pre-Modern Japanese Historical Documents

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access