Abstract
The paper proposes an algorithm for the script recognition based on the texture characteristics. The image texture is achieved by coding each letter with the equivalent script type (number code) according to its position in the text line. Each code is transformed into equivalent gray level pixel creating an 1-D image. Then, the image texture is subjected to the run-length analysis. This analysis extracts the run-length features, which are classified to make a distinction between the scripts under consideration. In the experiment, a custom oriented database is subject to the proposed algorithm. The database consists of some text documents written in Cyrillic, Latin and Glagolitic scripts. Furthermore, it is divided into training and test parts. The results of the experiment show that 3 out of 5 run-length features can be used for effective differentiation between the analyzed South Slavic scripts.
Highlights
The Balkan region, which is populated by South Slavs, is very rich in cultural heritage elements dated from medieval age
South Slavs had spoken the old Church Slavonic language. It was written with the Glagolitic alphabet called round Glagolitic script, but later it was replaced by the Cyrillic script in the east region of Balkan, i.e. in Bulgaria and Macedonia
The manuscript proposed a methodology for the script recognition in the South Slavic documents
Summary
The Balkan region, which is populated by South Slavs, is very rich in cultural heritage elements dated from medieval age. One of the most important cultural achievements represents the variety of used scripts. South Slavs had spoken the old Church Slavonic language. It was written with the Glagolitic alphabet called round Glagolitic script, but later it was replaced by the Cyrillic script in the east region of Balkan, i.e. in Bulgaria and Macedonia. In Bosnia, the local version of Cyrillic alphabet was used, while in Croatia, a variant of the Glagolitic alphabet called squared or angular Glagolitic script was preserved. All books from medieval age were written by the aforementioned scripts. Serbian language is the only European standard language with complete synchronic diagraphia, which uses both Cyrillic and Latin alphabets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have