SCUT-EPT: New Dataset and Benchmark for Offline Chinese Text Recognition in Examination Paper

Yuanzhi Zhu,Lianwen Jin,Ming Zhang,Xiaoxue Chen,Zecheng Xie,Yaoxiong Huang

doi:10.1109/access.2018.2885398

Yuanzhi Zhu, Lianwen Jin + Show 4 more

Open Access

https://doi.org/10.1109/access.2018.2885398

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 65	License type: cc-by-nc-nd

Affiliation: South China University of Technology

Abstract

Most existing studies and public datasets for handwritten Chinese text recognition are based on the regular documents with clean and blank background, lacking research reports for handwritten text recognition on challenging areas such as educational documents and financial bills. In this paper, we focus on examination paper text recognition and construct a challenging dataset named examination paper text (SCUT-EPT) dataset, which contains 50 000 text line images (40 000 for training and 10 000 for testing) selected from the examination papers of 2 986 volunteers. The proposed SCUT-EPT dataset presents numerous novel challenges, including character erasure, text line supplement, character/phrase switching, noised background, nonuniform word size, and unbalanced text length. In our experiments, the current advanced text recognition methods, such as convolutional recurrent neural network (CRNN) exhibits poor performance on the proposed SCUT-EPT dataset, proving the challenge and significance of the dataset. Nevertheless, through visualizing and error analysis, we observe that humans can avoid vast majority of the error predictions, which reveal the limitations and drawbacks of the current methods for handwritten Chinese text recognition (HCTR). Finally, three popular sequence transcription methods, connectionist temporal classification (CTC), attention mechanism, and cascaded attention-CTC are investigated for HCTR problem. It is interesting to observe that although the attention mechanism has been proved to be very effective in English scene text recognition, its performance is far inferior to the CTC method in the case of HCTR with large-scale character set.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SCUT-EPT: New Dataset and Benchmark for Offline Chinese Text Recognition in Examination Paper

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Deep Learning Based Handwritten Chinese Character and Text Recognition
Xu-Yao Zhang ... Yi-Chao Wu
-
Xu-Yao Zhang, et. al.Xu-Yao Zhang ... Yi-Chao Wu
01 Jan 2019
01 Jan 2019

A Fast and Accurate Fully Convolutional Network for End-to-End Handwritten Chinese Text Segmentation and Recognition
Dezhi Peng ... Mingxiang Cai
-
Dezhi Peng, et. al.Dezhi Peng ... Mingxiang Cai
01 Sep 2019
01 Sep 2019

Learning confidence transformation for handwritten Chinese text recognition
Da-Han Wang ... Cheng-Lin Liu
International Journal on Document Analysis and Recognition (IJDAR) | VOL. 17
Da-Han Wang, et. al.Da-Han Wang ... Cheng-Lin Liu
05 Nov 2013
International Journal on Document Analysis and Recognition (IJDAR) | VOL. 17

Handwritten Chinese text editing and recognition system
Shusen Zhou ... Xiaolong Wang
Multimedia Tools and Applications | VOL. 71
Shusen Zhou, et. al.Shusen Zhou ... Xiaolong Wang
14 Nov 2012
Multimedia Tools and Applications | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SCUT-EPT: New Dataset and Benchmark for Offline Chinese Text Recognition in Examination Paper

Abstract

Talk to us

Similar Papers

More From: IEEE Access