Abstract

In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly incorporating syntactically relevant information. In this article, we enhance transition-based parsing of Hindi and Urdu by redefining the features and feature extraction procedures that have been previously proposed in the parsing literature of Indian languages. We propose and empirically show that properly incorporating syntactically relevant information like case marking, complex predication and grammatical agreement in an arc-eager parsing model can significantly improve parsing accuracy. Our experiments show an absolute improvement of ∼2% LAS for parsing of both Hindi and Urdu over a competitive baseline which uses rich features like part-of-speech (POS) tags, chunk tags, cluster ids and lemmas. We also propose some heuristics to identify ezafe constructions in Urdu texts which show promising results in parsing these constructions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call