Abstract

The paper proposes an algorithm for the script recognition based on the texture characteristics. The image texture is achieved by coding each letter with the equivalent script type (number code) according to its position in the text line. Each code is transformed into equivalent gray level pixel creating an 1-D image. Then, the image texture is subjected to the run-length analysis. This analysis extracts the run-length features, which are classified to make a distinction between the scripts under consideration. In the experiment, a custom oriented database is subject to the proposed algorithm. The database consists of some text documents written in Cyrillic, Latin and Glagolitic scripts. Furthermore, it is divided into training and test parts. The results of the experiment show that 3 out of 5 run-length features can be used for effective differentiation between the analyzed South Slavic scripts.

Highlights

  • The Balkan region, which is populated by South Slavs, is very rich in cultural heritage elements dated from medieval age

  • South Slavs had spoken the old Church Slavonic language. It was written with the Glagolitic alphabet called round Glagolitic script, but later it was replaced by the Cyrillic script in the east region of Balkan, i.e. in Bulgaria and Macedonia

  • The manuscript proposed a methodology for the script recognition in the South Slavic documents

Read more

Summary

Introduction

The Balkan region, which is populated by South Slavs, is very rich in cultural heritage elements dated from medieval age. One of the most important cultural achievements represents the variety of used scripts. South Slavs had spoken the old Church Slavonic language. It was written with the Glagolitic alphabet called round Glagolitic script, but later it was replaced by the Cyrillic script in the east region of Balkan, i.e. in Bulgaria and Macedonia. In Bosnia, the local version of Cyrillic alphabet was used, while in Croatia, a variant of the Glagolitic alphabet called squared or angular Glagolitic script was preserved. All books from medieval age were written by the aforementioned scripts. Serbian language is the only European standard language with complete synchronic diagraphia, which uses both Cyrillic and Latin alphabets

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call