Abstract

Tibetan historical document are vast, second in quantity only to Chinese historical document in China, and they are considered a treasure of Chinese culture. The digital protection and utilization of Tibetan literature resources is a hot topic in the field of literature digitization. Layout analysis is an important basic step in the digitization of historical document. Tibetan historical document have a complex layout, a variety of graphic and text forms, and diverse backgrounds, all of which have an impact on the layout analysis. We design a method combining deep learning text line detection with rule-based layout analysis to realize layout analysis of Tibetan historical document. This method first conducts text detection through deep learning, then constructs text lines, and finally segments horizontal text regions and vertical text regions by rule analysis to realize the segmentation of the layout. Our self-built datasets with rich sample types show that the proposed method can achieve detection of a variety of layouts with high accuracy and provide reliable text regions for subsequent text recognition, thus offering strong application value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call