Abstract

Baybayin is a Tagalog-language writing system primarily used in the northern Philippines during the pre-Hispanic period. In 2018, the House of Representatives approved House Bill 1022 or the “National Writing System Act,” which declares the Baybayin script as the Philippines’ national writing system. Thus, documents, signages, books, etc. may soon have Baybayin texts. However, the Latin alphabet is still the primary script used in the country. Hence, it is possible that Latin and Baybayin scripts may be found on the same text. In this paper, we present an optical character recognition (OCR) system that identifies Baybayin scripts from Latin in a text image. The preprocessing method applies the conversion of the input image to binary data and calculating the respective bounding box of each word found from the text, where we utilize a modified 𝒌 − means algorithm and MATLAB ocr function, respectively. The classification then involves isolating each word and further segmenting each character’s components. With the aid of a support vector machine (SVM) character classifier, we determine the word’s script by the highest number of characters classified into either Baybayin or Latin. To the best of our knowledge, this is the first system that discriminates, at the block level, the Baybayin script from Latin. The proposed algorithm yields a 93.64% recognition accuracy tested in a novel dataset. The accompanying code of the proposed algorithm and the dataset are made publicly available to make the results of the study reproducible.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.