Chunker for Gujarati Language Using Hybrid Approach

Chetana Tailor,Bankim Patel

doi:10.1007/978-981-15-6014-9_10

Abstract

For free word order languages, chunking is quite challenging as they have relatively unrestricted phrase structures. A robust chunker helps in other NLP applications. This paper presents a Hybrid chunker for Gujarati Language. Contextual information in the form of last two unicodes of the word and of part-of-speech (POS) is used as the key features in developing the chunker using Machine learning approach. Four different statistical techniques, namely, SVM, CRF, Naive Bayes, and HMM have been implemented to identify the most appropriate technique for Chunking the text in Gujarati language. Further, to improve performance, linguistic rules have been designed. Finally, achieved accuracy is 98.21% with precision, recall, and F1 score of 96.42%, 95.62 and 96.02, respectively.

Full Text