Abstract

With the availability of limited electronic resources, development of a syntactic parser for all types of sentence forms is a challenging and demanding task for any natural language. This paper presents the development of Penn Treebank based statistical syntactic parsers for two South Dravidian languages namely Kannada and Malayalam. Syntactic parsing is the task of recognizing a sentence and assigning a syntactic structure to it. A syntactic parser is an essential tool used for various natural language processing (NLP) applications and natural language understanding. The well known grammar formalism called Penn Treebank structure was used to create the corpus for proposed statistical syntactic parsers. Both the parsing systems were trained using Treebank based corpus consists of 1,000 Kannada and Malayalam sentences that were carefully constructed. The developed corpus has been already annotated with correct segmentation and Part-Of-Speech (POS) information. We have used our own POS tagger generator for assigning proper tags to each and every word in the training and test sentences. The proposed syntactic parser was implemented using supervised machine learning and probabilistic context free grammars (PCFG) approaches. Training, testing and evaluations were done by support vector method (SVM) algorithms. From the experiment we found that the performance of our systems are significantly well and achieves a very competitive accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.