Abstract

Turkish is an agglutinative language where linguistic parameters can have significant consequences on the information retrieval performances. In this paper, different Turkish linguistic parameters truncation, stemming, stop words, etc. have been studied and their impacts on an information retrieval system performance have been invistiguated. Three word truncations at fixed length 3, 4 and 5 characters have been studied. The results have been compared using Snowball and Zemberek stemmers. Moreover, the results of using compound nouns, in addition to simple keywords, to index queries and documents have been studied. In the experimental part, Milliyet test collectionn have been tested by three information retrieval models. The comparisons of performance analysis have been done by he traditional information retrieval metrics and bpref metric since the test collection is build on an incomplete relevance judgments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call