Abstract
The digitization of Tibetan ancient books is of great significance to the preservation of Tibetan culture. This problem involves two tasks: Tibetan text detection and Tibetan text recognition. The former is undoubtedly crucial to automatic Tibetan text recognition. However, there are few works on Tibetan text detection, and lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we introduce the TxTAB dataset for evaluating text detection methods in Tibetan ancient books. The dataset is established based upon 202 treasured handwritten ancient Tibetan text images and is densely annotated with a multi-point annotation method without limiting the number of points. This is a challenging dataset with good diversity. It contains blurred images, gray and color images, the text of different sizes, the text of different handwriting styles, etc. An extensive experimental evaluation of 3 state-of-the-art text detection algorithms on TxTAB is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements particularly for detecting Tibetan text in images of low quality.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.