Abstract

For free word order languages, chunking is quite challenging as they have relatively unrestricted phrase structures. A robust chunker helps in other NLP applications. This paper presents a Hybrid chunker for Gujarati Language. Contextual information in the form of last two unicodes of the word and of part-of-speech (POS) is used as the key features in developing the chunker using Machine learning approach. Four different statistical techniques, namely, SVM, CRF, Naive Bayes, and HMM have been implemented to identify the most appropriate technique for Chunking the text in Gujarati language. Further, to improve performance, linguistic rules have been designed. Finally, achieved accuracy is 98.21% with precision, recall, and F1 score of 96.42%, 95.62 and 96.02, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call