Abstract

The diverse culture and ethnic groups in the Philippines creates a beautiful mixture of ideas, traditions, and practices but also makes it hard for researchers to keep track of them all. One integral part of any culture is language, with one of the most spoken languages in the Cordillera Administrative Region (CAR) being Kankanaey. Unfortunately, it has very little resources and documentation for it. This paper presents a corpus created for Kankanaey that contains 3412 words and was trained with a dataset containing 400 Kankanaey sentences in order to establish its syntactic rules. Data for the collected texts for Kankanaey were taken from public sources online and were organized into various categories based on the type of content. Training and testing was done to establish the syntactic rules using the Keras API. The rules were derived by having each word in the training sentences tagged with the corresponding POS tag. After tagging, the number of POS tags were then expanded to all possible combinations of the POS which resulted in the documenting of 1,722 syntactic rules for Kankanaey with the model having an accuracy of 64% when it was tested to identify the syntactic rules in 50 test sentences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call