Abstract

In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly incorporating syntactically relevant information. In this article, we enhance transition-based parsing of Hindi and Urdu by redefining the features and feature extraction procedures that have been previously proposed in the parsing literature of Indian languages. We propose and empirically show that properly incorporating syntactically relevant information like case marking, complex predication and grammatical agreement in an arc-eager parsing model can significantly improve parsing accuracy. Our experiments show an absolute improvement of ∼2% LAS for parsing of both Hindi and Urdu over a competitive baseline which uses rich features like part-of-speech (POS) tags, chunk tags, cluster ids and lemmas. We also propose some heuristics to identify ezafe constructions in Urdu texts which show promising results in parsing these constructions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.