Abstract
Today, when the importance of the country culture is deeply rooted in the hearts of the people, the protection of ancient books and literature has received more and more attention. In this paper, a method to identify texts in ancient books by deep learning is proposed and ancient book “usonarubesh” is chosen as a dataset to test the performance of the model. In this experiment, the layout of the text is extracted into grayscale image through ARU-Net (a neural pixel labeling machine for historical document layout analysis). At the same time the original image which contains the texts is binarized, which the texts are filled with black, while the backgrounds are filled with white. Each area of text is judged by the density of black pixels and the layouts. The cut texts are then selected as the testing dataset for the trained model of deep learning CNN, AlexNet (the training dataset is ready). Finally, the experimental results are analyzed to draw conclusions and to decide the direction of future work.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.