Performance of Turkish Information Retrieval: Evaluating the Impact of Linguistic Parameters and Compound Nouns

Hatem Haddad,Chedi Bechikh Ali

doi:10.1007/978-3-642-54903-8_32

Abstract

Turkish is an agglutinative language where linguistic parameters can have significant consequences on the information retrieval performances. In this paper, different Turkish linguistic parameters truncation, stemming, stop words, etc. have been studied and their impacts on an information retrieval system performance have been invistiguated. Three word truncations at fixed length 3, 4 and 5 characters have been studied. The results have been compared using Snowball and Zemberek stemmers. Moreover, the results of using compound nouns, in addition to simple keywords, to index queries and documents have been studied. In the experimental part, Milliyet test collectionn have been tested by three information retrieval models. The comparisons of performance analysis have been done by he traditional information retrieval metrics and bpref metric since the test collection is build on an incomplete relevance judgments.

Full Text