Abstract

Parsing, i.e., identifying the underlying hierarchical structure of natural language expressions is important for several natural language processing applications. In recent times Machine Learning (ML) approaches have been developed for this study for many languages. Most of the effective techniques require an annotated corpus of the language for training and validation. For the Manipuri language of the Tibeto-Burman family, neither such a corpus nor a grammar framework to automatically analyse and represent the structure of sentences exists yet. This study proposes a context-Free Grammar (CFG) that provides the framework to represent the structure of Manipuri sentences. This paves the way for parsing Manipuri sentences using CFG-based parsers for various applications and to conveniently build a Treebank for developing ML-based parsers for Manipuri. The rules of the proposed CFG are handcrafted after extensive analysis of the structure of Manipuri sentences. The grammar covers simple, compound, complex and compound-complex sentences. For evaluation, we induce an Earley’s parser with the proposed CFG and test it over a collection of sentences that covers the possible varieties of structure. A recognition rate of 83.20% achieved in these experiments indicates the effectiveness of the proposed grammar.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.