Semantic role induction in Persian: An unsupervised approach by using probabilistic models

Parisa Saeedi,Azadeh Shakery,Heshaam Faili

doi:10.1093/llc/fqu044

Abstract

Semantic roles describe the relation between a predicate (typically a verb) and its arguments. Semantic role labeling is a Natural Language Processing task that extracts these relations in the sentences. Different applications such as machine translation and question answering benefit from this level of semantic analysis. The creation of semantic role-annotated data is an obstacle to develop supervised learning systems, so we present a novel unsupervised approach to semantic role induction task. In our approach, which is formulized as a clustering method, the argument instances of the verb are clustered into semantic role classes specified for that verb. We present a Bayesian model for learning argument structure from un-annotated text and estimate the model parameters using expectation maximization method. Clustering of argument instances of a verb, which have semantic and syntactic similarities, can be a promising approach for unsupervised learning of their semantic roles. The only linguistic knowledge, which is prepared for linking the argument instances to semantic clusters is extracted from a verb valance lexicon. Our evaluation results on Persian language show that our system in both small and large training datasets works better than a strong baseline proposed by ([Lang and Lapata 2010][1]) which its idea is developed in Persian. We have used purity and inverse purity measures to assess the quality of the proposed semantic role clustering method. The results indicate the improvement about 9.73 and 1.65% in small dataset and 2.85 and 0.67% in large dataset in purity and inverse purity, respectively. [1]: #ref-20

Full Text