Automatic Detection of Arabic Non-Anaphoric Pronouns for Improving Anaphora Resolution

Muhammad Abdul-Mageed

doi:10.1145/1929908.1929913

Abstract

Anaphora resolution is one of the most difficult tasks in NLP. The ability to identify non-referential pronouns before attempting an anaphora resolution task would be significant, since the system would not have to attempt resolving such pronouns and hence end up with fewer errors. In addition, the number of non-referential pronouns has been found to be non-trivial in many domains. The task of detecting non-referential pronouns could also be incorporated into a part-of-speech tagger or a parser, or treated as an initial step in semantic interpretation. In this article, I describe a machine learning method for identifying non-referential pronouns in an annotated subsegment of the Penn Arabic Treebank using three different feature settings. I achieve an accuracy of 97.22% with 52 different features extracted from a small window size of -5/+5 tokens surrounding each potentially non-referential pronoun.

Full Text