Acquisition of Turkish meronym based on classification of patterns

Tuǧba Yıldız,Savaş Yıldırım,Banu Diri

doi:10.1007/s10044-015-0516-9

Abstract

The identification of semantic relations from a raw text is an important problem in Natural Language Processing. This paper provides semi-automatic pattern-based extraction of part---whole relations. We utilized and adopted some lexico-syntactic patterns to disclose meronymy relation from a Turkish corpus. We applied two different approaches to prepare patterns; one is based on pre-defined patterns that are taken from the literature, second automatically produces patterns by means of bootstrapping method. While pre-defined patterns are directly applied to corpus, other patterns need to be discovered first by taking manually prepared unambiguous seeds. Then, word pairs are extracted by their occurrence in those patterns. In addition, we used statistical selection on global data that is obtaining from all results of entire patterns. It is a whole-by-part matrix on which several association metrics such as information gain, T-score, etc., are applied. We examined how all these approaches improve the system accuracy especially within corpus-based approach and distributional feature of words. Finally, we conducted a variety of experiments with a comparison analysis and showed advantage and disadvantage of the approaches with promising results.

Full Text