Abstract

This paper describes a study of Turkish-English cross language information retrieval (CLIR) system. One of the biggest issues with CLIR studies is to access to bi-lingual parallel corpus. So, the first step of this study was to construct a parallel Turkish-English corpus. We have constructed a corpus that has 1801 parallel documents. The corpus has been divided in to two parts, first one for training the system and second one for testing the system. Latent semantic indexing (LSI) techniques applied to the training set to obtain the language relations. After the training, we have performed set of tests (queries) to measure the effectiveness of LSI based retrieval on Turkish-English parallel corpus. Our experimental results show that, LSI based CLIR outperforms the non-LSI based retrieval where their retrieval successes are %69 and %26 respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call